Models, code, and papers for "Guang-Zhong Yang":
Accurate volume segmentation from the Computed Tomography (CT) scan is a common prerequisite for pre-operative planning, intra-operative guidance and quantitative assessment of therapeutic outcomes in robot-assisted Minimally Invasive Surgery (MIS). The use of 3D Deep Convolutional Neural Network (DCNN) is a viable solution for this task but is memory intensive. The use of patch division can mitigate this issue in practice, but can cause discontinuities between the adjacent patches and severe class-imbalances within individual sub-volumes. This paper presents a new patch division approach - Patch-512 to tackle the class-imbalance issue by preserving a full field-of-view of the objects in the XY planes. To achieve better segmentation results based on these asymmetric patches, a 3D DCNN architecture using asymmetrical separable convolutions is proposed. The proposed network, called Z-Net, can be seamlessly integrated into existing 3D DCNNs such as 3D U-Net and V-Net, for improved volume segmentation. Detailed validation of the method is provided for CT aortic, liver and lung segmentation, demonstrating the effectiveness and practical value of the method for intra-operative 3D navigation in robot-assisted MIS.
Normalization layers are essential in a Deep Convolutional Neural Network (DCNN). Various normalization methods have been proposed. The statistics used to normalize the feature maps can be computed at batch, channel, or instance level. However, in most of existing methods, the normalization for each layer is fixed. Batch-Instance Normalization (BIN) is one of the first proposed methods that combines two different normalization methods and achieve diverse normalization for different layers. However, two potential issues exist in BIN: first, the Clip function is not differentiable at input values of 0 and 1; second, the combined feature map is not with a normalized distribution which is harmful for signal propagation in DCNN. In this paper, an Instance-Layer Normalization (ILN) layer is proposed by using the Sigmoid function for the feature map combination, and cascading group normalization. The performance of ILN is validated on image segmentation of the Right Ventricle (RV) and Left Ventricle (LV) using U-Net as the network architecture. The results show that the proposed ILN outperforms previous traditional and popular normalization methods with noticeable accuracy improvements for most validations, supporting the effectiveness of the proposed ILN.
3D shape instantiation which reconstructs the 3D shape of a target from limited 2D images or projections is an emerging technique for surgical intervention. It improves the currently less-informative and insufficient 2D navigation schemes for robot-assisted Minimally Invasive Surgery (MIS) to 3D navigation. Previously, a general and registration-free framework was proposed for 3D shape instantiation based on Kernel Partial Least Square Regression (KPLSR), requiring manually segmented anatomical structures as the pre-requisite. Two hyper-parameters including the Gaussian width and component number also need to be carefully adjusted. Deep Convolutional Neural Network (DCNN) based framework has also been proposed to reconstruct a 3D point cloud from a single 2D image, with end-to-end and fully automatic learning. In this paper, an Instantiation-Net is proposed to reconstruct the 3D mesh of a target from its a single 2D image, by using DCNN to extract features from the 2D image and Graph Convolutional Network (GCN) to reconstruct the 3D mesh, and using Fully Connected (FC) layers to connect the DCNN to GCN. Detailed validation was performed to demonstrate the practical strength of the method and its potential clinical use.
Shape instantiation which predicts the 3D shape of a dynamic target from one or more 2D images is important for real-time intra-operative navigation. Previously, a general shape instantiation framework was proposed with manual image segmentation to generate a 2D Statistical Shape Model (SSM) and with Kernel Partial Least Square Regression (KPLSR) to learn the relationship between the 2D and 3D SSM for 3D shape prediction. In this paper, the two-stage shape instantiation is improved to be one-stage. PointOutNet with 19 convolutional layers and three fully-connected layers is used as the network structure and Chamfer distance is used as the loss function to predict the 3D target point cloud from a single 2D image. With the proposed one-stage shape instantiation algorithm, a spontaneous image-to-point cloud training and inference can be achieved. A dataset from 27 Right Ventricle (RV) subjects, indicating 609 experiments, were used to validate the proposed one-stage shape instantiation algorithm. An average point cloud-to-point cloud (PC-to-PC) error of 1.72mm has been achieved, which is comparable to the PLSR-based (1.42mm) and KPLSR-based (1.31mm) two-stage shape instantiation algorithm.
Machine learning and data analysis have been used in many robotics fields, especially for modelling. Data are usually the result of sensor measurements and, as such, they might be subjected to noise and outliers. The presence of outliers has a huge impact on modelling the acquired data, resulting in inappropriate models. In this work a novel approach for outlier detection and rejection for input/output mapping in regression problems is presented. The robustness of the method is shown both through simulated data for linear and nonlinear regression, and real sensory data. Despite being validated by using artificial neural networks, the method can be generalized to any other regression method
2D bio-medical semantic segmentation is important for surgical robotic vision. Segmentation methods based on Deep Convolutional Neural Network (DCNN) out-perform conventional methods in terms of both the accuracy and automation. One common issue in training DCNN is the internal covariate shift, where the convolutional kernels are trained to fit the distribution change of input feature, hence both the training speed and performance are decreased. Batch Normalization (BN) is the first proposed method for addressing internal covariate shift and is widely used. Later Instance Normalization (IN) and Layer Normalization (LN) were proposed and are used much less than BN. Group Normalization (GN) was proposed very recently and has not been applied into 2D bio-medical semantic segmentation yet. Most DCNN-based bio-medical semantic segmentation adopts BN as the normalization method by default, without reviewing its performance. In this paper, four normalization methods - BN, IN, LN and GN are compared and reviewed in details specifically for 2D bio-medical semantic segmentation. The result proved that GN out-performed the other three normalization methods - BN, IN and LN in 2D bio-medical semantic segmentation regarding both the accuracy and robustness. Unet is adopted as the basic DCNN structure. 37 RVs from both asymptomatic and Hypertrophic Cardiomyopathy (HCM) subjects and 20 aortas from asymptomatic subjects were used for the validation. The code and trained models will be available online.
In robotic surgery, task automation and learning from demonstration combined with human supervision is an emerging trend for many new surgical robot platforms. One such task is automated anastomosis, which requires bimanual needle handling and suture detection. Due to the complexity of the surgical environment and varying patient anatomies, reliable suture detection is difficult, which is further complicated by occlusion and thread topologies. In this paper, we propose a multi-stage framework for suture thread detection based on deep learning. Fully convolutional neural networks are used to obtain the initial detection and the overlapping status of suture thread, which are later fused with the original image to learn a gradient road map of the thread. Based on the gradient road map, multiple segments of the thread are extracted and linked to form the whole thread using a curvilinear structure detector. Experiments on two different types of sutures demonstrate the accuracy of the proposed framework.
This paper presents a versatile robotic system for sewing 3D structured object. Leveraging on using a customized robotic sewing device and closed-loop visual servoing control, an all-in-one solution for sewing personalized stent graft is demonstrated. Stitch size planning and automatic knot tying are proposed as the two key functions of the system. By using effective stitch size planning, sub-millimetre sewing accuracy is achieved for stitch sizes ranging from 2mm to 5mm. In addition, a thread manipulator for thread management and tension control is also proposed to perform successive knot tying to secure each stitch. Detailed laboratory experiments have been performed to access the proposed instruments and allied algorithms. The proposed framework can be generalised to a wide range of applications including 3D industrial sewing, as well as transferred to other clinical areas such as surgical suturing.
Echocardiography plays an important part in diagnostic aid in cardiac diseases. A critical step in echocardiography-aided diagnosis is to extract the standard planes since they tend to provide promising views to present different structures that are benefit to diagnosis. To this end, this paper proposes a spatial-temporal embedding framework to extract the standard view planes from 4D STIC (spatial-temporal image corre- lation) volumes. The proposed method is comprised of three stages, the frame smoothing, spatial-temporal embedding and final classification. In first stage, an L 0 smoothing filter is used to preprocess the frames that removes the noise and preserves the boundary. Then a compact repre- sentation is learned via embedding spatial and temporal features into a latent space in the supervised scheme considering both standard plane information and diagnosis result. In last stage, the learned features are fed into support vector machine to identify the standard plane. We eval- uate the proposed method on a 4D STIC volume dataset with 92 normal cases and 93 abnormal cases in three standard planes. It demonstrates that our method outperforms the baselines in both classification accuracy and computational efficiency.
In robot-assisted Fenestrated Endovascular Aortic Repair (FEVAR), accurate alignment of stent graft fenestrations or scallops with aortic branches is essential for establishing complete blood flow perfusion. Current navigation is largely based on 2D fluoroscopic images, which lacks 3D anatomical information, thus causing longer operation time as well as high risks of radiation exposure. Previously, 3D shape instantiation frameworks for real-time 3D shape reconstruction of fully-deployed or fully-compressed stent graft from a single 2D fluoroscopic image have been proposed for 3D navigation in robot-assisted FEVAR. However, these methods could not instantiate partially-deployed stent segments, as the 3D marker references are unknown. In this paper, an adapted Graph Convolutional Network (GCN) is proposed to predict 3D marker references from 3D fully-deployed markers. As original GCN is for classification, in this paper, the coarsening layers are removed and the softmax function at the network end is replaced with linear mapping for the regression task. The derived 3D and the 2D marker references are used to instantiate partially-deployed stent segment shape with the existing 3D shape instantiation framework. Validations were performed on three commonly used stent grafts and five patient-specific 3D printed aortic aneurysm phantoms. Comparable performances with average mesh distance errors of 1$\sim$3mm and average angular errors around 7degree were achieved.
Deep Convolutional Neural Networks (DCNNs) are showing impressive performances in biomedical semantic segmentation. However, current DCNNs usually use down-sampling layers to achieve significant receptive field increasing and to gain abstract semantic information. These down-sampling layers decrease the spatial dimension of feature maps as well, which is harmful for semantic segmentation. Atrous convolution is an alternative for the down-sampling layer. It could increase the receptive field significantly but also maintain the spatial dimension of feature maps. In this paper, firstly, an atrous rate setting is proposed to achieve the largest and fully-covered receptive field with a minimum number of layers. Secondly, six atrous blocks, three shortcut connections and four normalization methods are explored to select the optimal atrous block, shortcut connection and normalization method. Finally, a new and dimensionally lossless DCNN - Atrous Convolutional Neural Network (ACNN) is proposed with using cascaded atrous II-blocks, residual learning and Fine Group Normalization (FGN). The Right Ventricle (RV), Left Ventricle (LV) and aorta data are used for the validation. The results show that the proposed ACNN achieves comparable segmentation Dice Similarity Coefficients (DSCs) with U-Net, optimized U-Net and the hybrid network, but uses much less parameters. This advantage is considered to benefit from the dimensionally lossless feature maps.
Real-time 3D navigation during minimally invasive procedures is an essential yet challenging task, especially when considerable tissue motion is involved. To balance image acquisition speed and resolution, only 2D images or low-resolution 3D volumes can be used clinically. In this paper, a real-time and registration-free framework for dynamic shape instantiation, generalizable to multiple anatomical applications, is proposed to instantiate high-resolution 3D shapes of an organ from a single 2D image intra-operatively. Firstly, an approximate optimal scan plane was determined by analyzing the pre-operative 3D statistical shape model (SSM) of the anatomy with sparse principal component analysis (SPCA) and considering practical constraints . Secondly, kernel partial least squares regression (KPLSR) was used to learn the relationship between the pre-operative 3D SSM and a synchronized 2D SSM constructed from 2D images obtained at the approximate optimal scan plane. Finally, the derived relationship was applied to the new intra-operative 2D image obtained at the same scan plane to predict the high-resolution 3D shape intra-operatively. A major feature of the proposed framework is that no extra registration between the pre-operative 3D SSM and the synchronized 2D SSM is required. Detailed validation was performed on studies including the liver and right ventricle (RV) of the heart. The derived results (mean accuracy of 2.19mm on patients and computation speed of 1ms) demonstrate its potential clinical value for real-time, high-resolution, dynamic and 3D interventional guidance.
In robotic surgery, tool tracking is important for providing safe tool-tissue interaction and facilitating surgical skills assessment. Despite recent advances in tool tracking, existing approaches are faced with major difficulties in real-time tracking of articulated tools. Most algorithms are tailored for offline processing with pre-recorded videos. In this paper, we propose a real-time 3D tracking method for articulated tools in robotic surgery. The proposed method is based on the CAD model of the tools as well as robot kinematics to generate online part-based templates for efficient 2D matching and 3D pose estimation. A robust verification approach is incorporated to reject outliers in 2D detections, which is then followed by fusing inliers with robot kinematic readings for 3D pose estimation of the tool. The proposed method has been validated with phantom data, as well as ex vivo and in vivo experiments. The results derived clearly demonstrate the performance advantage of the proposed method when compared to the state-of-the-art.
Artificial Intelligence (AI) is gradually changing the practice of surgery with the advanced technological development of imaging, navigation and robotic intervention. In this article, the recent successful and influential applications of AI in surgery are reviewed from pre-operative planning and intra-operative guidance to the integration of surgical robots. We end with summarizing the current state, emerging trends and major challenges in the future development of AI in surgery.
Soft wearable robots are a promising new design paradigm for rehabilitation and active assistance applications. Their compliant nature makes them ideal for complex joints like the shoulder, but intuitive control of these robots require robust and compliant sensing mechanisms. In this work, we introduce the sensing framework for a multi-DoF shoulder exosuit capable of sensing the kinematics of the shoulder joint. The proposed tendon-based sensing system is inspired by the concept of muscle synergies, the body's sense of proprioception, and finds its basis in the organization of the muscles responsible for shoulder movements. A motion-capture-based evaluation of the developed sensing system showed conformance to the behaviour exhibited by the muscles that inspired its routing and validates the hypothesis of the tendon-routing to be extended to the actuation framework of the exosuit in the future. The mapping from multi-sensor space to joint space is a multivariate multiple regression problem and was derived using an Artificial Neural Network (ANN). The sensing framework was tested with a motion-tracking system and achieved performance with root mean square error (RMSE) of approximately 5.43 degrees and 3.65 degrees for the azimuth and elevation joint angles, respectively, measured over 29000 frames (4+ minutes) of motion-capture data.
The recent successes of AI have captured the wildest imagination of both the scientific communities and the general public. Robotics and AI amplify human potentials, increase productivity and are moving from simple reasoning towards human-like cognitive abilities. Current AI technologies are used in a set area of applications, ranging from healthcare, manufacturing, transport, energy, to financial services, banking, advertising, management consulting and government agencies. The global AI market is around 260 billion USD in 2016 and it is estimated to exceed 3 trillion by 2024. To understand the impact of AI, it is important to draw lessons from it's past successes and failures and this white paper provides a comprehensive explanation of the evolution of AI, its current status and future directions.
This paper presents a vision-based learning-by-demonstration approach to enable robots to learn and complete a manipulation task cooperatively. With this method, a vision system is involved in both the task demonstration and reproduction stages. An expert first demonstrates how to use tools to perform a task, while the tool motion is observed using a vision system. The demonstrations are then encoded using a statistical model to generate a reference motion trajectory. Equipped with the same tools and the learned model, the robot is guided by vision to reproduce the task. The task performance was evaluated in terms of both accuracy and speed. However, simply increasing the robot's speed could decrease the reproduction accuracy. To this end, a dual-rate Kalman filter is employed to compensate for latency between the robot and vision system. More importantly, the sampling rates of the reference trajectory and the robot speed are optimised adaptively according to the learned motion model. We demonstrate the effectiveness of our approach by performing two tasks: a trajectory reproduction task and a bimanual sewing task. We show that using our vision-based approach, the robots can conduct effective learning by demonstrations and perform accurate and fast task reproduction. The proposed approach is generalisable to other manipulation tasks, where bimanual or multi-robot cooperation is required.
The current standard of intra-operative navigation during Fenestrated Endovascular Aortic Repair (FEVAR) calls for need of 3D alignments between inserted devices and aortic branches. The navigation commonly via 2D fluoroscopic images, lacks anatomical information, resulting in longer operation hours and radiation exposure. In this paper, a framework for real-time 3D robotic path planning from a single 2D fluoroscopic image of Abdominal Aortic Aneurysm (AAA) is introduced. A graph matching method is proposed to establish the correspondence between the 3D preoperative and 2D intra-operative AAA skeletons, and then the two skeletons are registered by skeleton deformation and regularization in respect to skeleton length and smoothness. Furthermore, deep learning was used to segment 3D pre-operative AAA from Computed Tomography (CT) scans to facilitate the framework automation. Simulation, phantom and patient AAA data sets have been used to validate the proposed framework. 3D distance error of 2mm was achieved in the phantom setup. Performance advantages were also achieved in terms of accuracy, robustness and time-efficiency. All the code will be open source.
Purpose: Advancements in MRI Tissue Phase Velocity Mapping (TPM) allow for the acquisition of higher quality velocity cardiac images providing better assessment of regional myocardial deformation for accurate disease diagnosis, pre-operative planning and post-operative patient surveillance. Translation of TPM velocities from the scanner's reference coordinate system to the regional cardiac coordinate system requires decoupling of translational motion and motion due to myocardial deformation. Despite existing techniques for respiratory motion compensation in TPM, there is still a remaining translational velocity component due to the global motion of the beating heart. To compensate for translational motion in cardiac TPM, we propose an image-processing method, which we have evaluated on synthetic data and applied on in vivo TPM data. Methods: Translational motion is estimated from a suitable region of velocities automatically defined in the left-ventricular volume. The region is generated by dilating the medial axis of myocardial masks in each slice and the translational velocity is estimated by integration in this region. The method was evaluated on synthetic data and in vivo data corrupted with a translational velocity component (200% of the maximum measured velocity). Accuracy and robustness were examined and the method was applied on 10 in vivo datasets. Results: The results from synthetic and in vivo corrupted data show excellent performance with an estimation error less than 0.3% and high robustness in both cases. The effectiveness of the method is confirmed with visual observation of results from the 10 datasets. Conclusion: The proposed method is accurate and suitable for translational motion correction of the left ventricular velocity fields. The current method for translational motion compensation could be applied to any annular contracting (tissue) structure.
Robot-assisted Fenestrated Endovascular Aortic Repair (FEVAR) is currently navigated by 2D fluoroscopy which is insufficiently informative. Previously, a semi-automatic 3D shape instantiation method was developed to instantiate the 3D shape of a main, deployed, and fenestrated stent graft from a single fluoroscopy projection in real-time, which could help 3D FEVAR navigation and robotic path planning. This proposed semi-automatic method was based on the Robust Perspective-5-Point (RP5P) method, graft gap interpolation and semi-automatic multiple-class marker center determination. In this paper, an automatic 3D shape instantiation could be achieved by automatic multiple-class marker segmentation and hence automatic multiple-class marker center determination. Firstly, the markers were designed into five different shapes. Then, Equally-weighted Focal U-Net was proposed to segment the fluoroscopy projections of customized markers into five classes and hence to determine the marker centers. The proposed Equally-weighted Focal U-Net utilized U-Net as the network architecture, equally-weighted loss function for initial marker segmentation, and then equally-weighted focal loss function for improving the initial marker segmentation. This proposed network outperformed traditional Weighted U-Net on the class-imbalance segmentation in this paper with reducing one hyper-parameter - the weight. An overall mean Intersection over Union (mIoU) of 0.6943 was achieved on 78 testing images, where 81.01% markers were segmented with a center position error <1.6mm. Comparable accuracy of 3D shape instantiation was also achieved and stated. The data, trained models and TensorFlow codes are available on-line.