End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network

Xiao, Qianmu; Zhao, Liang

doi:10.3390/app13116784

Open AccessArticle

End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network

by

Qianmu Xiao

¹ and

Liang Zhao

^1,2,*

¹

School of Computing and Electronic Informatics, Guangxi University, Nanning 530004, China

²

Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan 442000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6784; https://doi.org/10.3390/app13116784

Submission received: 11 April 2023 / Revised: 24 May 2023 / Accepted: 1 June 2023 / Published: 2 June 2023

(This article belongs to the Section Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Acquiring relevant, high-quality, and heterogeneous medical images is essential in various types of automated analysis, used for a variety of downstream data augmentation tasks. However, a large number of real image samples are expensive to obtain, especially for 3D medical images. Therefore, there is an urgent need to synthesize realistic 3D medical images. However, the existing generator models have poor stability and lack the guidance of prior medical knowledge. To this end, we propose a multi-task (i.e., segmentation task and generation task) 3D generative adversarial network (GAN) for the synthesis of 3D liver CT images (3DMT-GAN). To the best of our knowledge, this is the first application for a 3D liver CT image synthesis task. Specifically, we utilize a mask of vascular segmentation as the input because it contains structural information about a variety of rich anatomical structures. We use the semantic mask of the liver as prior medical knowledge to guide the 3D CT image generation, reducing the calculation of a large number of backgrounds, thus making the model more focused on the generation of the region of the liver. In addition, we introduce a stable multiple gradient descent algorithm (MGDA) reconstruction method into our model to balance the weights of the multi-task framework. Experiments were conducted on a real dataset, and the experimental results show that the segmentation task achieves a Dice similarity coefficient (DSC) of 0.87, while the synthesis task outperforms existing state-of-the-art methods. This study demonstrates the feasibility of using vascular images to synthesize images of the liver.

Keywords:

medical image synthesis; GAN; 3D liver reconstruction; multi-task learning

1. Introduction

With the advancement of medical imaging and computer graphics technology, 3D liver image reconstructions play an increasingly significant role in medical education, technician training, and simulated surgery [1,2]. However, acquiring complete and high-quality 3D liver CT images necessitates patients undergoing a full CT tomography procedure. Such data inherently involve patient privacy concerns and regional differences in medical equipment conditions. Additionally, the limited number of images for certain rare diseases makes it difficult for physicians to identify and learn from relevant cases.

Recently, deep learning (DL)-based liver image analysis has been a great success in the medical diagnosis field [3,4,5]. However, DL-based methods require a large number of image samples for learning and training. Currently, collecting and labelling medical image samples is a labour-intensive task that can consume a lot of manpower and material resources, further limited by medical ethics and patient privacy issues. Therefore, it is very difficult to obtain real medical image samples at this stage, especially 3D medical images [6,7,8,9]. Image generation techniques based on DL have made it possible to obtain more realistic segmented images, providing new methods. Nowadays, generative adversarial network (GAN)-derived image generation models have demonstrated remarkable results in the field of synthetic images [10,11], effectively addressing the issue of obtaining substantial amounts of medical image data for medical image synthesis.

However, the previous GAN-based generation methods are unstable and have high uncertainty [12,13,14]. Furthermore, different from natural images, medical images often contain anatomical knowledge, i.e., prior clinical knowledge, and previous medical image synthesis methods fail to consider this knowledge in depth [6,15]. Furthermore, previous images have been mostly based on the generation of 2D medical images, with few methods focused on the generation of 3D medical images. In this paper, we propose a multi-task 3D generative adversarial network (GAN) with liver mask segmentation and liver generation tasks to generate 3D liver CT images using vascular segmentation labels (see Figure 1). The contributions of this paper can be summarized as the follows:

We propose a multi-task conditional generation adversarial model that synthesizes 3D liver CT images by inputting liver vessel labels to provide a priori anatomical structure information and using real liver segmentation labels for guidance. To the best of our knowledge, this is the first application for 3D liver CT image generation.
We introduce a robust multi-gradient method to optimize multiple tasks, specifically by balancing the weights of individual tasks in a multi-task framework.
Extensive experiments were conducted on the collected real data, and the experimental results show that the performance of our method outperforms existing state-of-the-art methods. In addition, this study demonstrates the possibility of using vascular images to synthesize images of the liver.

2. Related Work

2.1. GAN-Based Model

Generative adversarial networks (GANs) [12] first introduced the adversarial relationship between the generator and discriminator for synthesizing pseudo-data that emulate the composite target dataset distribution. Subsequently, Radford et al. [13] incorporated convolutional neural networks (CNNs) to replace the deep neural network-based generator model, which significantly improved the quality and stability of the synthesized images. Due to the instability of the adversarial learning mechanism, prone to gradient vanishing or adversarial collapse, to constrain the randomness of the synthesis results, Mirza et al. proposed conditional GANs (CGANs) [14] to synthesize images that meet specific requirements by adding conditions to the random vector. Isola et al. [2] introduced the concept of image translation by removing the input of the random vector and directly using the complete image as the input condition. However, the translation mechanism of Pix2Pix requires paired image data inputs, which places high demands on the dataset. Therefore, Zhu et al. proposed unsupervised learning-based image style transfer models, such as CycleGAN [16] and StyleGAN [17]. Another interesting piece of research [18] was the proposed improved 3D-Unet, which was then embedded into the GAN framework to segment the 3D liver. This suggests that a GAN-based approach may enhance the performance of medical segmentation tasks. Recently, image generation tasks have become an extremely popular research topic, with some efficient models already more effective than GAN-based mechanisms. For instance, the latest variational autoencoder (VAE)-based diffusion models [19,20] and the language-to-image models developed by OpenAI [21] have yielded impressive results. However, such models are not fully open source and require huge amounts of data for training.

2.2. Medical Image Generation

The high cost of medical image sampling and the need for professional manual annotation pose a series of challenges for building large-scale medical image datasets with annotations. This problem can be effectively solved by augmenting a dataset with authentic-style pseudo-data synthesized by GANs and their derived models [12,22]. This includes the synthesis of new data from random noise, as reported in [23]. Additionally, for specific rare disease images with low data volumes, images can be synthesized with target lesion characteristics based on specific tags on the input [24,25]. These synthesized images can be used to supplement rare cases in the dataset, as well as for teaching and training purposes. The study of 3D liver image synthesis in this paper also belongs to this field of application. Similar related studies can be found in [6,8]. The former first proposed synthesizing fundus retinal images using the Pix2Pix model based on 2D retinal vessel segmentation labels, while the latter suggested adding classification feature loss and improved retinal byte loss to jointly constrain the synthesis results, thereby providing more detail. In another piece of research, Mende et al. [7] synthesized lung images with specific lesions based on both lung and lesion segmentation labels. However, similar work was primarily oriented towards the synthesis of 2D images, while the synthesis of lesion results required a separate input of the specific lesion labels for supervision. Ying et al. [26] used GAN to reconstruct 3D lung images using 2D lung X-ray images, the first application of using 2D data to synthesize 3D images. Previous 3D generated image tasks did not introduce the medical priori knowledge to guide image synthesis.

2.3. Multi-Task Learning

In this paper, we propose a novel synthetic image multi-task method for 3D liver CT image generation. One of the advantages of multi-task learning is that different tasks can share parameters and assist each other, eventually achieving joint improvement of multiple tasks [27,28]. For example, Pu et al. [29] obtained better performance by splitting the standard section recognition into a section type recognition task and a whether-or-not standard task. The task of section classification can assist the whether-standard task. Similarly, Zhao et al. [30] proposed that the performance of different tasks can be improved by using multiple tasks in the quality control of ultrasound images. To improve the neuroimage quality, Wang et al. [31] proposed a multi-task deep learning method to jointly synthesize multi-contrast neuroimaging using signal relaxation relationships and spatial information. Huang et al. [32] designed a multi-task decoherent modality transferable GAN, i.e., MCMT-GAN, to tackle the issue of brain MRI synthesis using an unsupervised method. The weights of multiple tasks in previous multi-task methods were set manually, potentially resulting in one task being overfitted and another not converged. More details of multi-task learning are reported in [33].

3. Materials and Methods

3.1. Dataset Description and Processing

The image generation model requires a large amount of real data for training. In this paper, we require complete 3D liver segmentation images and corresponding liver vessel segmentation labels to conduct the experiment. The commonly used public dataset with liver–vessel segmentation labels 3D-Iracadb [34] has only 20 patient scan sequences, and the complete sequence length is only about 30 sheets. Concerning the MSD [35] dataset, although the number of patients is high, it lacks accurate liver contour segmentation labels, while the sequence length is also generally in the upper 50 sheets. Therefore, we used the LiVS [36] dataset to evaluate the performance of our model. This dataset includes 515 complete patient CT image sequences containing 82,428 contrast-enhanced CT liver segmentation sections, of which 15,449 slices have been manually annotated for liver vessel segmentation. Due to the high workload and difficulty of manually annotating vessels, a large number of efficient liver vascular segmentation models [37,38,39] can be used to predict the segmentation label. We used part of the expert-labelled dataset, then trained the efficient liver vessel segmentation model [39] using the annotated dataset, predicting the remaining segmentation labels using the trained model.

In the selection of the sequence length for the input 3D image, i.e., the size of the 3D image in the axial direction, we acquired image sequences with lengths ranging from 70 to 300. In order to unify the size of the input 3D image and reduce the computing pressure, we divided a complete liver CT sequence into fixed-length sequence segments, with half the sequences overlapping each other between the two segments to ensure continuity between the sequence segments and to achieve data enhancement (as shown in Figure 2).

3.2. 3DMT-GAN

3.2.1. Overview

Our model (Figure 3) includes two main components, the generator and the discriminator. The vascular mask slice is the input as a fixed 2D sequence, fed through the generator to produce a 2D liver image sequence, i.e., a 3D liver image. The discriminator then judges whether the generated liver is a real 3D liver image or not.

3.2.2. Generation Task Based on Segmentation Task Guide

In this paper, we introduce a real liver segmentation mask to guide the synthetic image based on the existing basic pixel loss function. The general method to calculate

L_{1}

loss requires counting all pixel losses of the whole image. However, in the medical generation task, the image always contains a large amount of background, e.g., regions with pixel values of 0. These backgrounds not only consume a lot of computational resources when computing the loss but also reduce the sensitivity of the model to focus on the generation of the liver region. To solve this issue, we added a liver supervision segmentation task to the image synthesis model and calculated the average pixel loss of the liver region. The loss function of the segmentation task is as follows:

\begin{matrix} L_{s}^{G} = (L_{b c e} (m_{p}, m_{g t}) + L_{d s c} (m_{p}, m_{g t})) / 2, \end{matrix}

(1)

where

L_{b c e}

denotes the cross-entropy loss function and

L_{d s c}

indicates the Dice loss function,

m_{p}

represents the predicted liver segmentation mask and

m_{g t}

denotes the ground truth. The definition of these two loss functions is as follows:

\begin{matrix} L_{b c e} = - \frac{1}{N} \sum_{i = 1}^{N} [m_{g t_{i}} log (m_{p_{i}}) + (1 - m_{g t_{i}}) log (1 - m_{p_{i}})], \end{matrix}

(2)

\begin{matrix} L_{d s c} = 1 - \frac{2 |m_{g t} \cap m_{p}|}{|m_{g t}| + |m_{p}|}, \end{matrix}

(3)

where

m_{g t_{i}}

represents the i-th real image,

m_{p_{i}}

represents the i-th synthesized image, and N represents the number of samples. In addition to generating the segmentation mask, we also need to synthesize the liver image. The loss of the image generation liver task is defined as follows:

\begin{matrix} L_{t}^{G} = (|l_{g} - l_{g t}| \times m_{g t}) / sum (m_{g t}), \end{matrix}

(4)

\begin{matrix} |l_{g} - l_{g t}| = \frac{1}{H W Z} \sum_{i = 1}^{H} \sum_{j = 1}^{W} \sum_{k = 1}^{Z} |l_{i j k} - \hat{l_{i j k}}|, \end{matrix}

(5)

where

l_{g}

represents the generated pseudo-liver image,

l_{g t}

represents the real liver image,

m_{g t}

represents the real liver segmentation mask, and

s u m (m_{g t})

represents the number of pixels in the liver region in the liver segmentation mask. In the synthetic liver image task, the pixel difference between the whole synthetic image and the true image was calculated and the absolute value was found, then the true liver segmentation label was multiplied by the loss value pixel by pixel. At this time, the pixel loss of the liver region was preserved and the pixel loss of the non-liver region was set to 0, and finally the average pixel loss of the liver region was obtained by dividing the number of pixels in the liver segmentation region. This reduces the computational effort and allowed our model to focus more on the generation of liver regions rather than a large number of background regions.

Our multi-task generator adopted a U-net-like [40,41] encoder–decoder architecture with skip connections, except in the decoder we added a new task branch to form a Y-shaped generator structure, and the two decoder branches were responsible for two tasks, liver image generation and liver region segmentation, with the former activated by the

t a n h

function and supervised by the real liver segmentation image using

3 D

-

L_{1}

loss (Equation (5)). Furthermore, the latter task was activated by the

s i g m o i d

function and supervised by the liver segmentation label using Equations (4) and (5). The image synthesis task focused on synthesizing liver images and liver masks. Therefore, the total loss function contained two parts defined as follows (Equation (6)):

\begin{matrix} L^{G} = α L_{s}^{G} + (1 - α) L_{t}^{G}, \end{matrix}

(6)

where

L^{G}

represents the total generator loss, and

L_{t}^{G}

denotes the loss of the liver image synthesis, i.e., texture loss.

L_{s}^{G}

denotes the loss of the synthetic liver segmentation mask. The details of the patch discriminator configuration are listed in Table 1.

3.2.3. Multi-Task Generator Optimized by MGDA-UB

One key issue in the multi-task framework was the balance of weights for different tasks. For example, some tasks were more difficult and required a larger weight, while some tasks were simpler and require less weight. The core idea of this approach was to explore the shared parameters of multiple related tasks during the training process to improve the performance of the model on each task and to improve the generalization ability of the model. The structure of our multi-task generator model constitutes a typical multi-task learning model with hard shared parameters for two tasks focused on assigning the weights of the composite loss function composed between multiple tasks. Nowadays, there are several mature optimization ideas for this method, such as the hard parameter sharing model [42] with shared and unique parameters, using fixed or dynamic weights to construct compound loss functions to obtain better training efficiency [43,44]. In this paper, we select the upper bound optimized multiple gradient descent algorithm (MGDA-UB) [45] to dynamically update the weights in the composite loss function in each iteration, which has been demonstrated to be significantly better than the existing fixed weight method of each task, the gradient normalization method (Gradnorm) [43]. Meanwhile, it is computational more efficient than the original multiple gradient descent algorithm (MGDA) [44]. The dynamic weight parameters for each task were updated according to the KKT (Karush–Kuhn–Tucker) condition [46,47] to find the Pareto optimal point [48].

In Equation (6), the weights of the two tasks are

α

and

1 - α

, respectively, and based on the MGDA-UB algorithm, the optimization objective of our model is:

\begin{matrix} min_{α \in [0, 1]} {∥α \nabla_{Z} \hat{L^{1}} (θ^{s h}, θ^{1}) + (1 - α) \nabla_{Z} \hat{L^{2}} (θ^{s h}, θ^{2})∥}_{2}^{2}, \end{matrix}

(7)

where

α

ranges from 0 to 1,

θ^{s h}

represents the parameters shared between the different tasks of the model, which in this paper corresponds to the encoder.

θ^{1}

and

θ^{2}

denote the parameters of the two specific task branches, which in this paper correspond to the parameters of the two decoders.

\hat{L^{1}}

and

\hat{L^{2}}

represent the loss functions of different tasks,

{[]}_{+, f \frac{1}{T}}

denotes a clip function similar to Equation (9), and Z represents the bottleneck vector obtained by downsampling the input vessels in the model. Finally, in each iteration,

α

can be updated as:

\begin{matrix} \hat{α} = {[\frac{{(\nabla_{Z} \hat{L^{2}} (θ^{s h}, θ^{2}) - \nabla_{Z} \hat{L^{1}} (θ^{s h}, θ^{1}))}^{T} \nabla_{Z} \hat{L^{2}} (θ^{s h}, θ^{2})}{{∥\nabla_{Z} \hat{L^{1}} (θ^{s h}, θ^{1}) - \nabla_{Z} \hat{L^{2}} (θ^{s h}, θ^{2})∥}_{2}^{2}}]}_{+, \frac{1}{T}}, \end{matrix}

(8)

\begin{matrix} {[k]}_{+, \frac{1}{T}} = max (min (k, 1), 0) . \end{matrix}

(9)

The main idea of MGDA is to find a gradient direction for multi-task learning that reduces the loss of all tasks at the same time. This optimization method finds a gradient direction such that this direction achieves the best possible performance on each task. MGDA can represent the gradients of different tasks as vectors in a gradient space, and then to find a new gradient direction that has a certain degree of projection on the gradients of each task. By this method, we can find a direction that combines the gradient information of all tasks.

3.2.4. Patch Discriminator

In [49], Demir et al., proposed to determine the authenticity of a composite image by chunking it instead of giving a comprehensive authenticity output for the whole image. Inspired by this work, we also incorporated this core idea into our discriminator. In contrast to [49], we employed 3D convolution operations and the adversarial loss function of our patch discriminator composition as follows:

\begin{matrix} L^{D} = E [log (D (v, ℓ_{r}))] + E [log (1 - D (v, ℓ_{g}))], \end{matrix}

(10)

where

D (v, ℓ_{r})

represents the input of the discriminator. D represents the discriminator, v represents the input real vascular structure,

l_{r}

represents the corresponding real liver and

l_{g}

represents the liver generated by the generator.

D (v, ℓ_{g})

represents the output of the discriminator, which is a probability result of the different input patched 3D data, and a 3D matrix taking values between 0 and 1. The final loss function of our whole model is as follows:

\begin{matrix} L = β L^{G} + L^{D}, \end{matrix}

(11)

where the parameter

β

is set to 100, consistent with the Pix2Pix [2] model. The adjustment of

β

is considered to be a trade-off between the realism of the generated results and the restoration, when the larger the value of this parameter the more biased the synthesis results will be in the style of restoration, and the smaller the choice will be closer to the original image corresponding to it. The details of the patch discriminator configuration are listed in Table 2.

4. Experimental Results

4.1. Implementation Settings

We designed experiments with different input sequences for the segmentation task, as listed in Table 3. For the training settings, we set the batch size to 2, optimizer as an Adam optimizer with a learning rate of 0.00001. The epoch was set to 100. We implemented the source code using the Pytorch framework and trained the model on a Linux server with a single Nvidia A100-40 GB GPU. All liver segments were divided into training, validation, and testing sets in a ratio of 7:2:1. In this paper, we loaded the pre-training weights for the model [39] and then continued to fine-tune them on our data. The mask of the segmented vessels was then used as part of the input data for the 3D generation task. On the validation set (i.e., all the vascular segmentation masks in our experiment) the DSC was 0.803.

4.2. Evaluation Metrics

Three evaluation metrics, FID (Fréchet inception distance) [50], KID (kernel inception distance) [51], and LPIPS (learned perceptual image patch similarity) [52], were used to evaluate the performance of the synthesized task. These three metrics are defined as follows:

\begin{matrix} F I D (l_{g}, l_{p}) & = {∥μ_{g} - μ_{p}∥}_{2}^{2} + T r (Σ_{g} + Σ_{p} - 2 {(Σ_{g} Σ_{p})}^{\frac{1}{2}}) \end{matrix}

(12)

\begin{matrix} K I D (l_{g}, l_{p}) = M M D_{k} {(l_{g}, l_{p})}^{2} \end{matrix}

(13)

\begin{matrix} L P I P S (f_{g}, f_{p}) = \sum_{l = 1}^{L} w_{l} {∥Φ_{l} (f_{g}) - Φ_{l} (f_{p})∥}_{2}^{2} \end{matrix}

(14)

where

l_{g}

and

l_{p}

represent the high-dimensional feature vectors of the real liver image and the synthetic liver image extracted using the Inception V3 network, respectively.

μ_{g}

and

μ_{p}

represent the mean values of these two vectors, respectively.

Σ_{g}

and

Σ_{p}

represent the covariance of these two vectors, respectively,

T r

represents the trace operation. In Equation (14),

f_{g}

and

f_{p}

represents the real liver image and the generated liver image, L represents the number of layers in the VGG network used for feature extraction,

w_{l}

represents the weight of the l layer, and

Φ_{l} (f_{g})

represents the feature mapping of the image f at the l layer.

{∥ \cdot ∥}_{2}

represents the L2 norm, denoting the Euclidean length of the vector.

FID and KID were used to calculate whether the ground truth and the synthesized image have the same distribution. The FID was calculated using the Inception V3 network [53], by removing the final fully connected and pooling layers, a high-dimensional feature can be obtained, in this paper we used the highest 2048 dimensions. The vectors extracted by the Inception V3 network from our real liver CT images obey a specific distribution. If the corresponding extracted feature vectors of the synthesized liver image also obey the same distribution, then the synthesized image has a high degree of realism. In other words, by calculating the distance between the real image and the feature vector of the synthesizing image, the smaller the distance obtained, the stronger the performance of the model. The calculation of KID is similar to the FID, which also requires the extraction of high-dimensional feature vectors from the Inception V3 networks, but the maximum mean discrepancy (MMD) is calculated for the two feature vectors instead of the Fréchet distance as in the case of FID. The KID and FID statistics can be quickly obtained by installing and using the Python package torch-fidelity [54].

LPIPS forces the generator to learn to reconstruct the reverse mapping of the real image from fake images, by learning the reverse mapping of the generated image to real image and prioritizing the perceived similarity between them. In this paper, we used the Python package lpips to count LPIPS. Although the focus of this task was not to accurately segment the liver, but to demonstrate the authenticity of the synthetic 3D liver image, we still used the DSC (Equation (6)) coefficient to evaluate the error of the synthetic 3D liver with the real liver. FID, KID and LPIPS were used to evaluate the detail texture of the synthetic image. DSC was used to evaluate the contour of the synthetic images.

4.3. Baseline Methods

Three classic advanced GAN-based methods used in medical image synthesis are: CGAN [14], Pix2Pix [2], and CycleGAN [16]. The two former methods are supervised training, while the latter is unsupervised training. The training strategy of the Pix2Pix and CGAN are very similar, both using supervised training. The main difference lies in the parameter settings for the generator’s input and upsampling. The latter uses random noise fused with the input conditions to create random synthetic results, while the former only uses images as the input conditions and only uses dropout operations in the generator’s upsampling stage to achieve diversity in the synthetic results. CycleGAN, on the other hand, uses unsupervised training, consisting of two pairs of generators and discriminators. Compared with the first two models, CycleGAN does not require paired datasets for training, but demands a higher training data volume.

4.4. Quantitative Evaluation

As shown in Table 4, compared with CGAN, Pix2Pix and CycleGAN for three different input sequence images based on the four evaluation metrics, our method obtained the best performance in 10 cases, indicating the effectiveness of the proposed method. From Table 4, the texture of the synthesized image was the best (i.e., the value of FID, KID, and LPIPS are highest) when the number of the sequence is 48 (i.e., 48 s). The contour of the synthesized image was optimal (i.e., DSC is the best) when the number of the sequence was 64 (i.e., 64 s). From analysis of the experimental results, in terms of the accuracy of the contour of the final synthesized image, when the sequence length was longer, the more accurate was the contour of the generated image. In terms of the realism of the generated image texture, using a sequence length that was too long or too short made the texture details of the generated image poor, and an appropriate sequence length (i.e., 48) should be chosen. For a sequence length of 32, less information was contained about the overall structure of the vessel, and the synthesized image was poorer in terms of contour and detailed texture. For a more direct display of the results, we give a line plot of the comparison results, as shown in Figure 4.

4.5. Visualization of the Results

Figure 5, Figure 6 and Figure 7 show a visualization of the results of all methods, including synthetic results of 3D shell, 3D CT images and 2D tomography, respectively. The visualization of the synthetic image from our model and the three comparison models is shown in Figure 5. The comparison of the 2D slices with the corresponding real image show that the synthesis results using the supervised learning Pix2Pix, CGAN, and the proposed method basically fit the real liver CT segmentation image style. In contrast, the correct synthesis cannot be performed when using the unsupervised learning of CycleGAN. We found that the texture style results of Pix2Pix and CGAN synthesis still have some errors with the real liver slice images, while our method achieved the best synthesis performance.

Figure 6 shows the 3D visualization of the internal structure of the synthetic liver segmentation images from three different viewpoints (i.e., 2D slices are stacked to synthesize 3D images). It can be seen from Figure 6 that the CycleGAN method can only retain the input vascular information and cannot synthesize more accurate internal liver structures, while the three other models can all synthesize some internal liver anatomical structures.

Figure 7 shows a visualization of the geometry of the synthetic liver. Our method extracted the contour information of the synthetic 3D liver image data and performed a 3D reconstruction to show how the synthetic liver image differs from the real liver in terms of geometry. It can be seen from the figure that the 3D liver image synthesized with our method was closer to the real liver in terms of contours, while the external contours of the comparison models all showed large errors.

5. Discussion

5.1. Discussion of the Discriminator Inputs

In this paper, there is an issue with determining what is fed into the discriminator. As shown in Figure 8, there are two types of inputs that can be used by the discriminator to make a judgment. The first lets the synthetic liver and the synthetic liver segmentation mask be multiplied together to filter out the background, and then input into the discriminator with the vessel segmentation mask. The second is the liver vascular segmentation mask, the synthetic liver, and the synthetic liver mask. The second type inputs three parts as three channels of the image into the discriminator, and the first type inputs two channels of the image, as illustrated in Figure 8. As shown in Figure 8, when used as the second input, the texture features generated by the generator was more realistic and detailed, and closer to the real image. As shown in Figure 9, for different inputs, i.e., two or three channels, we show more direct synthetic image results. In Figure 9, we can still observe that the detail of the synthetic image with three channels was closer to the real image.

5.2. Advantages of the Multi-Task Generator

5.2.1. Discussion of the Differences in Texture Details

The single task does not consider the correlation between tasks and lacks a mechanism to share information among multiple-related tasks, making the learning ability of the model decrease during the training process. Our multi-task generator can share the information of the liver segmentation and liver synthesis tasks. Furthermore, the label of the liver segmentation can also supervise the liver synthesis task. In our comparison method, Pix2Pix and CGAN are single task generators. As shown in Figure 10, the region marked by the red circle is the portal vein in the real liver image, and this structure was usually contained within the liver region. In the comparison models Pix2Pix and CGAN, this structure was generated outside the liver region. Our model effectively avoids this problem, an improvement brought on by using multi-task generator based on a segmentation task to guide our generation task.

5.2.2. Discussion of the Effect of Lesion Synthesis

As shown in Figure 11, we give some synthetic visualizations of images containing lesions. Specifically, Figure 11 shows six tomographic scans of the liver region and the corresponding synthetic results for three patients, two with occupying lesions and one with terminal cancer. The red arrows indicate the areas of significant differences in the liver images produced with different generators. It can be seen that in the single-task model, only the L1-loss method was used for the whole image signal intensity constraint. As a result, the synthesized whole image only has the basic contour structure and the internal texture features were ignored. This disparity was reflected in the fact that regions with weaker signals in the real image were directly treated as signal-free regions in the synthesis results. In contrast, our method performed a better synthesis even though some images contain lesioned regions.

5.3. Limitations of Our Method

Although our method achieved the best synthetic image performance, especially when synthesizing normal liver images, it still has some limitations. For example, our model is sensitive to the prediction and reconstruction of certain liver images with large lesions, such as advanced cancer and large occupying lesions. Because our method uses liver vessels to synthesize liver images, these images with particularly large lesion areas often have a large number of structures in the vessels that may have been lesioned, and the synthesis results were relatively poor compared to normal images. Meanwhile, the number of these samples was also very small, which limits the model’s performance. For liver samples with only minor lesions, our model was insensitive and performed better, because the vascular structures corresponding to such diseases are often not clearly distinguishable from normal liver images.

In addition, manually labelled vascular labels were still visually disconnected after 3D reconstruction, indicating that it was still difficult to obtain accurate vascular structure labels. This shows that clear vascular structural information is indispensable to synthesize 3D liver images. To enable the proposed method to be extended to clinical applications, we need to collect imaging data from different patients in the future. Furthermore, we plan to build a database of liver vascular images. In addition to this, we need to explore whether other anatomical structures can be generated in the liver.

6. Conclusions

In this paper, we present a multi-task generative adversarial network for synthesizing 3D liver images. To the best of our knowledge, this is the first application for the 3D liver synthesis task. Specifically, we have used a vascular segmentation mask as a input because it contains structural information about a variety of rich anatomical structures. We have proposed using the masks labelled with liver segmentation for supervised and guided liver synthesis. In addition, we have introduced a stable gradient descent algorithm to balance the weight distribution among multiple tasks. In addition, we have utilized a masking mechanism in the discriminator and generator to filter out a large number of background regions, making our model more focused on the generation of liver regions rather than a large number of background regions. We have performed experiments on real data, and the experimental results show that our method improves both quantitative and visual analyses. This indicates that it is feasible to synthesize the liver using a vessel segmentation mask and that our multi-task generation method is effective. However, our model also has some limitations, such as being too sensitive to liver images containing a large number of lesions. In the future, we will focus on using multimodal information to synthesize 3D livers.

Author Contributions

Conceptualization, Q.X.; Methodology, Q.X.; Software, Q.X.; Validation, Q.X.; Resources, L.Z.; Data curation, L.Z.; Writing—original draft, Q.X.; Writing—review and editing, L.Z.; Visualization, Q.X.; Project administration, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was collectively supported by the National Natural Science Foundation of China (32060150) and the Advantages Discipline Group (Medicine) Project in Higher Education of Hubei Province (2021–2025) (2022XKQT5).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in reference number.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huettl, F.; Saalfeld, P.; Hansen, C.; Preim, B.; Poplawski, A.; Kneist, W.; Lang, H.; Huber, T. Virtual reality and 3D printing improve preoperative visualization of 3D liver reconstructions—Results from a preclinical comparison of presentation modalities and user’s preference. Ann. Transl. Med. 2021, 9, 1074. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Lu, Y.; Li, K.; Pu, B.; Tan, Y.; Zhu, N. A YOLOX-based Deep Instance Segmentation Neural Network for Cardiac Anatomical Structures in Fetal Ultrasound Images. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 1–12. [Google Scholar] [CrossRef]
Wu, X.; Tan, G.; Pu, B.; Duan, M.; Cai, W. DH-GAC: Deep hierarchical context fusion network with modified geodesic active contour for multiple neurofibromatosis segmentation. Neural Comput. Appl. 2022, 1–16. [Google Scholar] [CrossRef]
Pu, B.; Lu, Y.; Chen, J.; Li, S.; Zhu, N.; Wei, W.; Li, K. Mobileunet-fpn: A semantic segmentation model for fetal ultrasound four-chamber segmentation in edge computing environments. IEEE J. Biomed. Health Inform. 2022, 26, 5540–5550. [Google Scholar] [CrossRef] [PubMed]
Liang, N.; Yuan, L.; Wen, X.; Xu, H.; Wang, J. End-To-End Retina Image Synthesis Based on CGAN Using Class Feature Loss and Improved Retinal Detail Loss. IEEE Access 2022, 10, 83125–83137. [Google Scholar] [CrossRef]
Mendes, J.; Pereira, T.; Silva, F.; Frade, J.; Morgado, J.; Freitas, C.; Negrão, E.; de Lima, B.F.; da Silva, M.C.; Madureira, A.J.; et al. Lung CT image synthesis using GANs. Expert Syst. Appl. 2023, 215, 119350. [Google Scholar] [CrossRef]
Costa, P.; Galdran, A.; Meyer, M.I.; Niemeijer, M.; Abràmoff, M.; Mendonça, A.M.; Campilho, A. End-to-end adversarial retinal image synthesis. IEEE Trans. Med. Imaging 2017, 37, 781–791. [Google Scholar] [CrossRef] [PubMed]
Jabbarpour, A.; Mahdavi, S.R.; Sadr, A.V.; Esmaili, G.; Shiri, I.; Zaidi, H. Unsupervised pseudo CT generation using heterogenous multicentric CT/MR images and CycleGAN: Dosimetric assessment for 3D conformal radiotherapy. Comput. Biol. Med. 2022, 143, 105277. [Google Scholar] [CrossRef] [PubMed]
Skandarani, Y.; Jodoin, P.M.; Lalande, A. Gans for medical image synthesis: An empirical study. J. Imaging 2023, 9, 69. [Google Scholar] [CrossRef]
Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552. [Google Scholar] [CrossRef] [Green Version]
Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative adversarial networks in medical image augmentation: A review. Comput. Biol. Med. 2022, 144, 105382. [Google Scholar] [CrossRef] [PubMed]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Yu, B.; Zhou, L.; Wang, L.; Shi, Y.; Fripp, J.; Bourgeat, P. Ea-GANs: Edge-Aware Generative Adversarial Networks for Cross-Modality MR Image Synthesis. IEEE Trans. Med Imaging 2019, 38, 1750–1762. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
He, R.; Xu, S.; Liu, Y.; Li, Q.; Liu, Y.; Zhao, N.; Yuan, Y.; Zhang, H. Three-Dimensional Liver Image Segmentation Using Generative Adversarial Networks Based on Feature Restoration. Front. Med. 2021, 8, 794969. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18 July 2021; pp. 8162–8171. [Google Scholar]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
Shin, H.C.; Tenenholtz, N.A.; Rogers, J.K.; Schwarz, C.G.; Senjem, M.L.; Gunter, J.L.; Andriole, K.P.; Michalski, M. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In Proceedings of the Simulation and Synthesis in Medical Imaging: Third International Workshop, SASHIMI 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 16 September 2018; pp. 1–11. [Google Scholar]
Han, C.; Hayashi, H.; Rundo, L.; Araki, R.; Shimoda, W.; Muramatsu, S.; Furukawa, Y.; Mauri, G.; Nakayama, H. GAN-based synthetic brain MR image generation. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), Washington, DC, USA, 4–7 April 2018; pp. 734–738. [Google Scholar]
Oliveira, D.A.B. Implanting synthetic lesions for improving liver lesion segmentation in CT exams. arXiv 2020, arXiv:2008.04690. [Google Scholar]
Jiang, Y.; Chen, H.; Loew, M.; Ko, H. COVID-19 CT image synthesis with a conditional generative adversarial network. IEEE J. Biomed. Health Inform. 2020, 25, 441–452. [Google Scholar] [CrossRef]
Ying, X.; Guo, H.; Ma, K.; Wu, J.; Weng, Z.; Zheng, Y. X2CT-GAN: Reconstructing CT from biplanar X-rays with generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10619–10628. [Google Scholar]
Hering, A.; Hansen, L.; Mok, T.C.; Chung, A.C.; Siebert, H.; Häger, S.; Lange, A.; Kuckertz, S.; Heldmann, S.; Shao, W.; et al. Learn2Reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Trans. Med. Imaging 2022, 42, 697–712. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 2021, 34, 5586–5609. [Google Scholar] [CrossRef]
Pu, B.; Li, K.; Li, S.; Zhu, N. Automatic fetal ultrasound standard plane recognition based on deep learning and IIoT. IEEE Trans. Ind. Inform. 2021, 17, 7771–7780. [Google Scholar] [CrossRef]
Zhao, L.; Li, K.; Pu, B.; Chen, J.; Li, S.; Liao, X. An ultrasound standard plane detection model of fetal head based on multi-task learning and hybrid knowledge graph. Future Gener. Comput. Syst. 2022, 135, 234–243. [Google Scholar] [CrossRef]
Wang, G.; Gong, E.; Banerjee, S.; Martin, D.; Tong, E.; Choi, J.; Chen, H.; Wintermark, M.; Pauly, J.M.; Zaharchuk, G. Synthesize High-Quality Multi-Contrast Magnetic Resonance Imaging From Multi-Echo Acquisition Using Multi-Task Deep Generative Model. IEEE Trans. Med. Imaging 2020, 39, 3089–3099. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.; Zheng, F.; Cong, R.; Huang, W.; Scott, M.R.; Shao, L. MCMT-GAN: Multi-Task Coherent Modality Transferable GAN for 3D Brain Image Synthesis. IEEE Trans. Image Process. 2020, 29, 8187–8198. [Google Scholar] [CrossRef] [PubMed]
Vandenhende, S.; Georgoulis, S.; Van Gansbeke, W.; Proesmans, M.; Dai, D.; Van Gool, L. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3614–3633. [Google Scholar] [CrossRef]
Soler, L.; Hostettler, A.; Agnus, V.; Charnoz, A.; Fasquel, J.; Moreau, J.; Osswald, A.; Bouhadjar, M.; Marescaux, J. 3D Image Reconstruction for Comparison of Algorithm Database: A Patient Specific Anatomical and Medical Image Database; Tech. Rep; IRCAD: Strasbourg, France, 2010; Volume 1. [Google Scholar]
Simpson, A.L.; Antonelli, M.; Bakas, S.; Bilello, M.; Farahani, K.; Van Ginneken, B.; Kopp-Schneider, A.; Landman, B.A.; Litjens, G.; Menze, B. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv 2019, arXiv:1902.09063. [Google Scholar]
Zhao, L. Liver Vessel Segmentation. 2022. Available online: https://doi.org/10.21227/rwys-mk84 (accessed on 30 May 2023).
Yan, Q.; Wang, B.; Zhang, W.; Luo, C.; Xu, W.; Xu, Z.; Zhang, Y.; Shi, Q.; Zhang, L.; You, Z. Attention-guided deep neural network with multi-scale feature fusion for liver vessel segmentation. IEEE J. Biomed. Health Inform. 2020, 25, 2629–2642. [Google Scholar] [CrossRef]
Su, J.; Liu, Z.; Zhang, J.; Sheng, V.S.; Song, Y.; Zhu, Y.; Liu, Y. DV-Net: Accurate liver vessel segmentation via dense connection model with D-BCE loss function. Knowl.-Based Syst. 2021, 232, 107471. [Google Scholar] [CrossRef]
Gao, Z.; Zong, Q.; Wang, Y.; Yan, Y.; Wang, Y.; Zhu, N.; Zhang, J.; Wang, Y.; Zhao, L. Laplacian salience-gated feature pyramid network for accurate liver vessel segmentation. IEEE Trans. Med. Imaging 2023. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; pp. 424–432. [Google Scholar]
Caruna, R. Multitask learning: A knowledge-based source of inductive bias1. In Proceedings of the ICML’93: Proceedings of the Tenth International Conference on International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 41–48. [Google Scholar]
Chen, Z.; Badrinarayanan, V.; Lee, C.Y.; Rabinovich, A. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proceedings of the International Conference on Machine Learning, Macau, China, 26–28 February 2018; pp. 794–803. [Google Scholar]
Désidéri, J.A. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Math. 2012, 350, 313–318. [Google Scholar] [CrossRef]
Sener, O.; Koltun, V. Multi-task learning as multi-objective optimization. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Karush, W. Minima of Functions of Several Variables with Inequalities as Side Constraints. Master’s Thesis, Department of Mathematics, University of Chicago, Chicago, IL, USA, 1939. [Google Scholar]
Bertsekas, D.P. Nonlinear programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
Lin, X.; Zhen, H.L.; Li, Z.; Zhang, Q.F.; Kwong, S. Pareto multi-task learning. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Demir, U.; Unal, G. Patch-based image inpainting with generative adversarial networks. arXiv 2018, arXiv:1803.07422. [Google Scholar]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying mmd gans. arXiv 2018, arXiv:1801.01401. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Obukhov, A.; Seitzer, M.; Wu, P.W.; Zhydenko, S.; Kyl, J.; Lin, E.Y.J. High-Fidelity Performance Metrics for Generative Models in PyTorch; Version: 0.3.0; Zenodo: Geneva, Switzerland, 2020. [Google Scholar] [CrossRef]

Figure 1. An example of image synthesis using our proposed method.

Figure 2. Example slices of the 3D image sequence. The whole vasculature is divided into fixed vascular pieces at the input. The third and fourth figures represent the reconstructed liver pieces and the whole liver, respectively.

Figure 3. Overview of the proposed multi-task image translation model.

Figure 4. Visualization of the quantitative assessment results. We remove the CycleGAN results because they are not informative. Compared to the rest of the three models, our model demonstrates a clear superiority.

Figure 5. The visualization of the 2D CT synthesis image.

Figure 6. The visualization of the 3D CT synthesis image.

Figure 7. The visualization of the 3D liver reconstruction.

Figure 8. Two inputs (i.e., two channels or three channels) for the discriminator.

Figure 9. The input of the discriminator. The first one treats the synthetic liver mask, vessel mask and synthetic liver as the three channels of the image. The second input treats the result of the synthetic liver and synthetic liver mask after multiplying them (for filtering the background) as one channel, and the vessel mask as another channel.

Figure 10. Different methods for visualization of synthetic images. We extracted the synthesis results of different models at the same stage of early training using the same section scanning position. It can be seen that our model can quickly capture the key areas of the image in the early stage, and start synthesizing basic textures earlier. Moreover, in the later stages, the model can still maintain relatively correct edge shapes.

Figure 11. Visualization of synthetic images containing areas with lesions based on the single- and our multi-task generator. We can see that after the masking operation using the synthetic mask, the discriminator’s attention is focused on the liver region, and this gap is particularly prominent in the tomography results of patients with advanced cancer (columns 2 and 3 in the figure) because the images of such vascular image with lesions are difficult to acquire accurate vascular structures.

Table 1. Configuration of the generator encoder.

Input Size	320 × 320 × 64	320 × 320 × 48	320 × 320 × 32
Generator Encoder	160 × 160 × 32	160 × 160 × 24	160 × 160 × 32
	80 × 80 × 16	80 × 80 × 12	80 × 80 × 16
	40 × 40 × 8	40 × 40 × 6	40 × 40 × 8
	20 × 20 × 4	20 × 20 × 3	20 × 20 × 4
	10 × 10 × 2	10 × 10 × 1	10 × 10 × 2
Bottleneck	5 × 5 × 1	5 × 5 × 1	5 × 5 × 1

Table 2. Configuration of the patch discriminator.

Input Size	320 × 320 × 64	320 × 320 × 48	320 × 320 × 32
Discriminator	160 × 160 × 32	160 × 160 × 24	160 × 160 × 16
	80 × 80 × 16	80 × 80 × 12	80 × 80 × 8
	40 × 40 × 8	40 × 40 × 6	40 × 40 × 4
	20 × 20 × 4	20 × 20 × 3	20 × 20 × 2
Output Size	10 × 10 × 4	10 × 10 × 3	10 × 10 × 2

Table 3. Number of input sequences for the segmentation task.

Segmentation Layers	64 s	48 s	32 s
Number of liver sequence segments	1621	2323	3579

Table 4. Comparisons of all methods. * indicates the best evaluated metric in the current round of training, bold font indicates that the best flat cabinet value was obtained in all experiments.

Number	Model/Evaluation Metrics	FID (dim = 2048)	KID	LPIPS	DSC
1	CGAN_64s	101.459	0.1172	0.180	0.865
2	Pix2Pix_64s	85.045	0.097	0.174 *	0.867
3	CycleGAN_64s	245.226	0.307	0.449	0.599
4	MTv2l_64s (Ours)	79.295 *	0.086 *	0.177	0.872 *
5	CGAN_48s	98.485	0.1168	0.186	0.863
6	Pix2Pix_48s	111.586	0.136	0.192	0.856
7	CycleGAN_48s	195.270	0.234	0.422	0.614
8	MTv2l_48s (Ours)	61.900 *	0.062 *	0.181 *	0.865 *
9	CGAN_32s	109.561	0.129	0.194	0.855
10	Pix2Pix_32s	107.912	0.124	0.186 *	0.833
11	CycleGAN_32s	297.366	0.344	0.515	0.464
12	MTv2l_32s (Ours)	85.992 *	0.094 *	0.187	0.858 *

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, Q.; Zhao, L. End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network. Appl. Sci. 2023, 13, 6784. https://doi.org/10.3390/app13116784

AMA Style

Xiao Q, Zhao L. End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network. Applied Sciences. 2023; 13(11):6784. https://doi.org/10.3390/app13116784

Chicago/Turabian Style

Xiao, Qianmu, and Liang Zhao. 2023. "End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network" Applied Sciences 13, no. 11: 6784. https://doi.org/10.3390/app13116784

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

End-to-End 3D Liver CT Image Synthesis from Vasculature Using a Multi-Task Conditional Generative Adversarial Network

Abstract

1. Introduction

2. Related Work

2.1. GAN-Based Model

2.2. Medical Image Generation

2.3. Multi-Task Learning

3. Materials and Methods

3.1. Dataset Description and Processing

3.2. 3DMT-GAN

3.2.1. Overview

3.2.2. Generation Task Based on Segmentation Task Guide

3.2.3. Multi-Task Generator Optimized by MGDA-UB

3.2.4. Patch Discriminator

4. Experimental Results

4.1. Implementation Settings

4.2. Evaluation Metrics

4.3. Baseline Methods

4.4. Quantitative Evaluation

4.5. Visualization of the Results

5. Discussion

5.1. Discussion of the Discriminator Inputs

5.2. Advantages of the Multi-Task Generator

5.2.1. Discussion of the Differences in Texture Details

5.2.2. Discussion of the Effect of Lesion Synthesis

5.3. Limitations of Our Method

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI