Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis

Badr, Haidi; Wanas, Nayer; Fayek, Magda

doi:10.3390/app14010406

Open AccessArticle

Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis

by

Haidi Badr

^1,*,

Nayer Wanas

¹ and

Magda Fayek

²

¹

Informatics Department, Electronics Research Institute, Cairo P.O. Box 12622, Egypt

²

Computer Engineering Department, Faculty of Engineering, Cairo University, Giza P.O. Box 12613, Egypt

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 406; https://doi.org/10.3390/app14010406

Submission received: 9 October 2023 / Revised: 24 December 2023 / Accepted: 27 December 2023 / Published: 1 January 2024

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

Download

Browse Figures

Versions Notes

Abstract

:

Unsupervised domain adaptation (UDA) presents a significant challenge in sentiment analysis, especially when faced with differences between source and target domains. This study introduces Weighted Sequential Unsupervised Domain Adaptation (WS-UDA), a novel sequential framework aimed at discovering more profound features and improving target representations, even in resource-limited scenarios. WS-UDA utilizes a domain-adversarial learning model for sequential discriminative feature learning. While recent UDA techniques excel in scenarios where source and target domains are closely related, they struggle with substantial dissimilarities. This potentially leads to instability during shared-feature learning. To tackle this issue, WS-UDA employs a two-stage transfer process concurrently, significantly enhancing model stability and adaptability. The sequential approach of WS-UDA facilitates superior adaptability to varying levels of dissimilarity between source and target domains. Experimental results on benchmark datasets, including Amazon reviews, FDU-MTL datasets, and Spam datasets, demonstrate the promising performance of WS-UDA. It outperforms state-of-the-art cross-domain unsupervised baselines, showcasing its efficacy in scenarios with dissimilar domains. WS-UDA’s adaptability extends beyond sentiment analysis, making it a versatile solution for diverse text classification tasks.

Keywords:

unsupervised domain adaptation; poor target resources; text classification; transfer learning

1. Introduction

Recent years have witnessed a proliferation of deep learning methods that significantly enhance various machine learning problems, especially in Natural Language Processing (NLP) tasks such as sentiment analysis [1,2]. While conventional deep learning stands as the primary and impactful technique for sentiment analysis [3], its effectiveness relies on the assumption that the distributions of training and testing data share commonalities and originate from the same source. Unfortunately, this assumption often proves incorrect, leading to divergence in the distribution and characteristics of training and testing data sets. This divergence stems from factors such as the (i) prior shift, (ii) co-variate shift, and (iii) concept shift [4,5]. In the context of sentiment analysis, a tangible example of prior shift could be observed in the evaluation of a product’s reviews concerning its price before and after a sale. This shift in prior probabilities can significantly impact the interpretation of sentiment, as the change in context may lead to different sentiments being expressed by reviewers based on the altered pricing conditions. On the other hand, the co-variate shift emerges when sample selection bias impacts proper generalization from the learned model. This issue arises when the training data lacks adequate randomization, resulting in an unrepresentative sample. For instance, bias could occur by only selecting either low or high-rated reviews in sentiment analysis. Finally, concept shift occurs when data properties, like variance, mean, and co-variance, change over time. This is evident when sentiment expressed in social media posts evolves or fluctuates over time. This poses a challenge for sentiment analysis models that rely on historical data, as the underlying patterns and relationships may no longer accurately represent the current state of sentiment.

Instances of these shifts create distinct distributions between the source and target data sets, known as domain shift. This domain shift significantly diminishing the efficacy of conventional statistical learning techniques, highlighting the importance of acquiring labeled data sets that accurately represent the new distribution. However, the challenge lies in the cost and time constraints associated with collecting these new data sets, which must faithfully capture the updated distribution [6,7].

In tackling this challenge, researchers have explored different approaches and techniques with the goal of enhancing the efficiency of learning models by addressing variations in data distributions across training and testing domains [5,6]. One suggested solution involves alleviating domain disparities across domains through the implementation of what is known as domain adaptation techniques. Domain adaptation (DA) entails the identification of shared-features and the minimization of discrepancies in feature distributions across different domains [5]. In the context of sentiment analysis or other natural language processing tasks, domains may represent different sources of text data with distinct characteristics, such as different vocabulary, or subject. There are three main categories of domain adaptation, distinguished by the availability of labeled data: (i) supervised domain adaptation (SDA), (ii) semi-supervised domain adaptation (SSDA), and (iii) unsupervised domain adaptation (UDA). SDA involves labeled data in both the source and target domains which assumes labeled data exists in both the source and target domains, allowing for fine-tuning of the model [8]. SSDA deals with a limited amount of labeled data in the target domain, and the model is trained using both labeled data from the source domain and the available labeled data from the target domain [6,9]. UDA assumes no labeled data in the target domain, requiring unsupervised techniques to align feature distributions between the source and target domains [4,10,11]. UDA has gained significant importance in machine learning research [2,12,13]. This is primarily because UDA alleviates the requirement for labeled data in the target domain, which might be inaccessible or come with high costs. This quality renders UDA an invaluable approach, allowing models to adapt to new domains [14]. Acknowledging its significance and wide-ranging applicability, this paper will focus on the exploration and investigation of UDA techniques.

Unsupervised Domain Adaptation (UDA) is geared towards transferring knowledge from a labeled domain (source domain) to an unlabeled domain (target domain) to enhance performance in the latter [4,15]. Various techniques, including domain adversarial learning, have been employed to align feature representations [16,17], collectively aiming to facilitate effective model adaptation. Adversarial-based UDA approach involves training a domain classifier to distinguish between source and target domains while simultaneously training a sentiment classifier to predict sentiment labels. The goal is to learn domain-invariant features that are informative for sentiment analysis [18]. However, the challenge arises when the target domain has limited resources, making it uncertain that the learned domain-invariant representation contains sufficient relevant features. This issue becomes more pronounced when there is significant dissimilarity between the source and target domains, potentially leading to unstable training. To address this concern and propose a pragmatic UDA approach, we present the Weighted Sequential UDA framework (WS-UDA).

WS-UDA aims to enhance the understanding of the target domain and elevate target performance through a weighted sequential framework. This training approach unfolds in two consecutive stages, each serving a specific purpose in the adaptation process. The initial stage employs Unsupervised Domain Adaptation (UDA) techniques, seeking to obtain a shared-features representation by utilizing both labeled source and unlabeled target data sets. However, the challenge arises in this first stage, as the shared representation introduces uncertainty about capturing all relevant features from the target domain. This uncertainty may lead to model instability, especially in scenarios with a substantial gap between the source and target domains.

To address this limitation and bolster the model’s stability and adaptability, the second stage of WS-UDA is introduced. In this subsequent phase, the shared representation obtained from the first stage serves as the starting point for a consecutive training stage. This informed starting point is anticipated to contribute to a more robust adaptation process, ultimately improving performance, particularly in scenarios marked by significant dissimilarities between the source and target domains.The contributions in this work are summarized as follows:

This work proposes WS-UDA, a sequential architecture designed to enhance domain-invariant feature representation learning in a manner akin to fine-tuning. It accomplishes sentiment analysis in an unsupervised domain adaptation framework.
WS-UDA model exhibits the capability to identify deeper features beneficial to the target domain, leading to improved overall performance.
The experiments conducted on three benchmark data sets, encompassing both single-source/single-target domain and multi-source/single-target domain settings, reveal notable enhancements in the performance of unsupervised domain adaptation when employing the proposed WS-UDA model
Experiments also demonstrate that a greater percentage of improvement can be achieved the more distant the domains.

The subsequent sections are organized as follows: Section 2 offers a comprehensive review of unsupervised domain adaptation. In Section 3, we articulate the problem statement and motivation. Section 4 introduces the proposed Weighted Sequential Unsupervised Domain Adaptation (WS-UDA) framework, delving into its architectural design and key components. Moving forward, Section 5 is dedicated to detailing the experimental setup and presenting empirical results derived from rigorous evaluations, accompanied by insightful comparisons against existing state-of-the-art approaches. Finally, Section 6 encapsulates a summary of the findings and contributions of this work, concluding with a discussion on potential directions for future investigations.

2. Related Works

Figure 1 depicts the categorization of recent methods for unsupervised domain adaptation approaches. These methods are categorized into two main groups: (i) shallow domain adaptation approaches, and (ii) deep domain adaptation approaches. The subsequent discourse provides a comprehensive exploration of each category, presenting in-depth insights into the varied strategies employed.

2.1. Shallow Domain Adaptation Approaches

Shallow approaches seek to bridge the gap between the source and target domains by aligning the domain distribution via either (i) instance weighting techniques or (ii) feature transformation techniques.

Instance-based approaches resolve the domain adaptation problem by applying importance weighting techniques to correct the bias between the source and target domains. Thus, the source domain samples are re-weighted based on the density ratio of the source and target domains,

w (x) = \frac{p_{T} (x)}{p_{S} (x)}

, where

w (x)

is the re-weighting factor for the samples in the source domain [19]. Considering a basic scenario, where we need to create a model to identify cancer in a region with an extensive elderly population. Only a few examples from the target domain are supplied, and comparable statistics from other regions with a large youthful population are also provided. Transferring all the data from another region might not be successful because the elderly have a higher risk of cancer in comparison to a younger population. To reduce the difference between source and target samples, the straightforward approach is to assign weights to the instances in the source domain. The dimensionality problem impacts the instance re-weighting method, since it can be challenging to measure alignment in high-dimensional environments. Moreover, all instance-based techniques require some labeled data from the same distribution as the test data to perform effectively. Thus, it is not applicable within an unsupervised domain adaptation setting.

Feature-based adaptation techniques, on the other hand, aim to transform the original features into a new feature space. Thereon, performing an optimization technique to reduce the gap between domains in the contemporary representation space. This feature transformation can be achieved through different techniques, including feature alignment [20], feature clustering [21], feature selection [22], and feature reconstruction techniques [23]. Techniques must be robust enough to handle the complexity of diverse feature spaces. In addition, there is a risk of over-fitting to the source domain or aligning features too closely, which might not generalize well to the target domain. It is also important to note that feature-based domain adaptation assumes the existence of a domain-invariant mapping from the source domain to the target domain. This assumption may not always hold true in practice, particularly for complex or non-linearly separable data distributions. In such cases, alternative approaches like deep adversarial models, or GANs may be more suitable.

2.2. Deep Domain Adaptation Approaches

Modern domain adaptation strategies primarily leverage advanced deep architectures to address challenges associated with domain shift, as highlighted in [24,25,26,27]. These strategies are broadly categorized into two main groups: (i) traditional deep domain adaptation and (ii) adversarial deep domain adaptation. The upcoming section provides a succinct exploration of specific strategies within these categories.

2.2.1. Traditional Deep Domain Adaptation

Traditional deep domain adaptation approaches, as classified by the representation of domain discrepancy [28], are divided into two main categories: (i) discrepancy-based and (ii) reconstruction-based [29,30].

Discrepancy-based approaches focus on aligning domain distributions through the minimization of a distance metric within deep network architectures. For instance, ref. [31] introduces the Deep Adaptation Network (DAN), which employs a kernel-based approach utilizing Reproducing Kernel Hilbert Space (RKHS) to effectively align higher-order statistics of the distributions. DAN utilizes the Multi-Kernel of Maximum Mean Discrepancy (MK-MMD) to measure RKHS distance between the mean embedding of source and target distributions, emphasizing marginal distribution alignment without considering conditional distribution discrepancy. Alternatively, another MMD-based distribution matching technique is proposed by [32], Deep Transfer Network (DTN). DTN considers both the conditional and marginal distributions. Unlike using the MMD-based distribution matching [33], DTN utilizes the joint distribution difference of multiple layers to learn the transferable features by proposing the Joint Adaptation Network (JAN). It is a form of unsupervised domain adaptation. However, training and implementing DTN might be computationally expensive and complex due to its utilization of multiple layers and joint distribution matching. Another UDA approach, Deep CORAL (DCORAL), adopts the correlation alignment (CORAL) discrepancy measure to achieve deep domain adaptation. DCORAL aims to learn a representation shared by the source and the target by learning the non-linear transformation that aligns the correlations of the layers. It is designed for unsupervised domain adaptation. However, DCORAL assumes a linear transformation to align correlations, which may not capture highly non-linear relationships between domains.

Reconstruction-based approaches, on the other hand, are centered around mitigating distribution disparities between domains by minimizing the reconstruction error while acquiring a domain-invariant representation in an intermediate feature space representation [23,26,34,35]. Auto-encoders play a crucial role in this category, where encoder parameters are learned based on source domain samples, and a decoder is utilized to reconstruct target domain samples [26]. A specific instance of this category is the Separation Network (DSN) introduced by [36], which utilizes a scale-invariant Mean Squared Error (MSE) reconstruction loss to enhance the learning of invariant representations. Notably, DSN is designed for use exclusively in supervised domain adaptation scenarios. Another approach in this category is Transfer Learning with Deep Auto-encoders (TLDA) proposed by [29,30]. TLDA employs two auto-encoders for the source and target domains. It utilizes the auto-encoder as a feature extractor for final predictions, and the target classifier is trained on labeled examples using the feature representation accessible by the first-layer output of the encoder. However, it necessitates labeled examples for the target domain to train the target classifier, limiting its application in fully unsupervised scenarios. Additionally, Causal Auto-Encoder (CAE), presented by [23], integrates deep auto-encoder and causal structure learning to derive causal representations using available data. CAE is typically applied in unsupervised domain adaptation settings. One notable drawback of auto-encoder-based techniques is the limited incorporation of linguistic information in the induced representations, potentially restricting their interpretability in language-related tasks.

2.2.2. Adversarial Deep Domain Adaptation

Adversarial domain adaptation, a recent technique, has made significant strides in addressing domain-adaptation challenges, demonstrating notable progress in demanding tasks [37,38,39]. Many researchers have utilized adversarial learning as a powerful domain-invariant feature extractor for domain adaptation due to its popularity [37,40,41,42,43]. The Generative Adversarial Networks (GANs) [44], are the significant motivation for adversarial domain adaptation. GANs attempt to decrease the cross-domain disparity by learning a deep invariant representation across domains through mini-max optimization. While the original GAN framework is inherently unsupervised, modifications and extensions allow GANs to be applied in various supervised and semi-supervised scenarios. Building upon this concept, ref. [45] proposed Domain-Adversarial Neural Networks (DANNs), a deep adversarial-based DA approach. Several variants have emerged to capitalize on adversarial learning in domain adaptation [16,43,46,47]. For instance, ref. [16] propose Adversarial Discriminative Domain Adaptation (ADDA), which aims to reduce classification, soft label, domain classifier, and domain confusion losses to align cross-domains distributions.

Adversarial learning in unsupervised domain adaptation has proven effective in scenarios with closely related source and target domains. However, challenges arise when significant dissimilarities exist between the two domains, leading to potential instability during shared-feature learning [48,49]. In this case, adversarial learning approaches do not guarantee comprehensive coverage of relevant patterns and information present in the target domain. To address this limitation, we propose a Weighted Sequential Unsupervised Domain Adaptation (WS-UDA) framework that incorporates weight initialization to enhance model stability and adaptability.

3. Motivation

The growing prevalence of machine learning models, especially in sentiment analysis, has spurred the exploration of Unsupervised Domain Adaptation (UDA) techniques. Sentiment analysis models, proficient in understanding sentiments within specific domains, face challenges when applied to diverse contexts due to domain shift. This shift arises from significant differences between the characteristics of source and target domains. For instance, a sentiment analysis model trained on kitchen product reviews may struggle when applied to book reviews, given the variations in language and expressed sentiments. While adversarial learning has demonstrated effectiveness in domain adaptation, it falls short of ensuring comprehensive coverage of target domain information. There are instances where the model may not fully capture the nuances of the target domain during adversarial training. To overcome this limitation, we introduce the Weighted Sequential Unsupervised Domain Adaptation (WS-UDA) framework. This framework is based on retraining the model with carefully initialized weights, providing a more effective starting point for the adaptation process. Retraining with appropriate weight initialization provides various advantages, such as enhanced stability and convergence. Initializing the model with weights representing shared features, rather than using random initialization, ensures stability and establishes a controlled starting point. This aids in converging to a solution that integrates features from both domains. Furthermore, improved generalization is achieved, as a model initialized with informative weights from both domains is more likely to generalize effectively to unseen data in the target domain. The proposed WS-UDA framework aims to advance adversarial learning UDA techniques by prioritizing the integration of target domain features, thereby enhancing the model’s ability to generalize across diverse domains. The following sections will review our proposed framework’s details, followed by the experimental results and analysis.

4. Weighted Sequential Unsupervised Domain Adaption Framework

The proposed Weighted Sequential Unsupervised Domain Adaptation (WS-UDA) framework introduces a sequential feature extraction approach for shared features obtained from both labeled source and unlabeled target domains. Built upon adversarial learning, a fundamental technique for acquiring domain-transferable features in robust deep neural networks, WS-UDA follows an integrated structure comprising two consecutive UDA steps working in tandem. The schematic representation of the suggested framework is outlined in Figure 2.

4.1. WS-UDA Step 1

Step 1 embodies a classical adversarial learning process in Unsupervised Domain Adaptation (UDA), Algorithm 1. It starts by embedding samples from both the source and target domains through an embedding layer, such as word2vec [50]. This embedded representation then undergoes adversarial learning, encompassing two integral substeps. Firstly, the source-labeled samples are utilized to train a feature extractor network. Subsequently, both source and target samples are fed into a domain classifier tasked with distinguishing between source and target domain features. In this adversarial interplay, the feature extractor network and the domain classifier engage in a minimax game. The primary objective is for the feature extractor to confound the domain classifier, minimizing its loss to the extent that the classifier cannot predict the domain of a sample based on its shared features. When the domain classifier fails to discern the domain of the input, the extracted features are deemed shared features, signifying the absence of domain-specific knowledge. At this pivotal point, the shared feature representation proves instrumental for representing the samples and training the label classifier, specifically for sentiment classification in our context. Within the realm of adversarial-based Unsupervised Domain Adaptation (UDA) techniques, the Multinomial Adversarial Networks (MANs) approach [43] stands out for its exceptional performance and adaptability, extending to tasks such as bilingual transfer learning [51]. Consequently, MAN serves as the foundation for developing and validating our WS-UDA model. Importantly, our proposed framework offers the flexibility to integrate any alternative UDA approach centered around extracting shared-feature representations, providing a viable substitute for MAN. The described scenario has demonstrated effectiveness when the source and target domains share a reasonable degree of similarity. However, challenges arise when the two domains exhibit substantial differences, leading to issues of stability and limited generalization to the target domain. Consequently, our proposed second step in the WS-UDA involves a retraining process to address this challenge.

4.2. WS-UDA Step 2

This step involves initializing the shared feature extractor with the weights obtained in the initial adversarial learning step, with the objective of extracting deeper features beneficial for the target domain. Through retraining, the aim is to leverage improved generalization to the target domain. Additionally, starting with information from both the source and target domains is expected to enhance the overall representation, providing a more robust foundation compared to scenarios without this retraining step. The process mirrors the first step, except that the previously learned weights serve as initialization for the shared-feature extractor. This is depicted in Figure 2, indicating that learning shared features is more effective with this initialization, as the shared feature extractor begins with higher proximity to the target domain. Consequently, the target domain is more consistently represented, leading to enhanced model performance. Experimental results compared to state-of-the-art methodologies demonstrate the effectiveness of WS-UDA by positively impacting performance.

Algorithm 1: Adversarial-based UDA algorithm. WS-UDA’s Step 1.

5. Experiments and Evaluation

The empirical evaluation of the WS-UDA framework’s performance is conducted in this section. To ensure a fair comparison and rule out the possibility that the observed benefits stem solely from the model’s complexity, we maintain an identical configuration for the MAN model in both steps of the framework, as outlined in Table 1. Consequently, any observed improvement is attributed to the framework’s capability to unveil additional deep features, enhancing performance through the proposed sequential approach. Like any deep neural network, all hyperparameters are determined through experimental tuning and optimization.

For the evaluation, we adopt the following settings: cross-validation is employed, wherein each domain of the data set is divided into five folds. Three folds serve as the training set, one as the test set, and the remaining data act as the validation set. The classification accuracy reported in subsequent sections represents the average accuracy over the five folds.

Two sets of assessments are conducted as follows: the first assessment set for the WS-UDA framework involves a multi-source/single-target configuration. Here, one of the four domains in the data set is identified as the target domain, while the other three serve as source domains. The second assessment set for the WS-UDA framework entails a single-source/single-target configuration. In this scenario, within each data set, one domain is designated as the source, and another as the target domain.

5.1. Data Sets

The Amazon reviews data set [52] serves as a widely referenced data set for sentiment analysis across various domains. This data set encompasses reviews from four distinct Amazon product categories: (i) electronics, (ii) DVDs, (iii) kitchen appliances, and (iv) books. Each product category is treated as a separate domain, featuring 2000 labeled samples and an average of 3000 unlabeled samples. Twelve cross-domain tasks are formulated, connecting each pair of domains, such as kitchen → books, …, books → kitchen. A Multi-Layer Perceptron (MLP) is utilized as a feature extractor with the pre-processed version of the Amazon Review data [43] for fair comparison. Each review is expressed as a 5000-dimensional feature vector using the most frequently used features, with feature values equal to feature frequencies.

The FDU-MTL data set [53] encompasses sixteen distinct domains, each containing reviews with binary sentiment labels. Within each domain, there are 1400 labeled samples and 2000 unlabeled samples, further divided into training (70%), validation (10%), and testing (20%). To provide an unbiased comparison to the competitors baselines methodologies that leverage this data set in UDA, three domains are chosen: baby, MR, and apparel. As a result, six transfer tasks are set up: baby → apparel, baby → MR, …, and apparel → MR. As a feature extractor architecture, a CNN with one convolution layer is used.

The Spam data set [54] is a text data collection that compromised of two subcategory from the UCI repository: the SMS-Spam data set [55] and the Enron Spam data set [56]. The Enron data set comprises 1440 labeled samples and 3500 unlabeled samples, consisting of emails exchanged among senior executives at Enron Corporation. In contrast, the SMS-Spam Collection data set contains short messages and encompasses 900 labeled samples and 1800 unlabeled samples.

5.2. Comparison Baselines

Two distinct sets of experiments were conducted to validate the WS-UDA framework’s efficiency. The first set of tests evaluates performance in a multi-source/single-target context, whereas the second set evaluates performance in a single-source/single-target setting.

Using the Amazon data set and the multi-source domain adaption configuration, we compare the results to the following baselines:
- msDA [25]: to learn new representations for domain adaptation, it leverages marginalized stacking denoising auto-encoders.
- DANN [41]: it proposes adversarial training as a domain adaptation representation learning approach.
- MDAN (S-MAX) and MDAN (H-MAX) [17]: are two adversarial neural models.
- MAN [43]: a multinomial adversarial networks.
- DACL [57]: dual adversarial co-learning approach for text classification.
Additionally, we benchmark our approach against the following studies, which contrasted their results utilizing the FDU-MTL data set with multi-source domain adaptation setting:
- Meta [58] a meta network for storing the knowledge shared by many associated tasks.
For single-source/single-target setting: using the Amazon data set, a comparison is made with the following competitors that mentioned their findings under comparable circumstance:
- DTFC [59]: an auto-encoder-based approach for extracting complicated feature representations by disturbing distinct features with varied corruption probabilities.
- TransNorm [60]: adversarial learning in which deep networks are trained in a conditional paradigm.
- ALDA [46]: combines both of self-training and domain-adversarial learning in a cohesive approach for learning domain-invariant feature representations.
- CAE [23]: an auto-encoder that integrates deep auto-encoding and causal structure-learning approaches into a single framework to generate causal representations from a single-source domain.

To the best of our knowledge, the state-of-the-art techniques that employ the FDU-MTL data set with a single-source/single-target setting choose three domains out of the 16: MR, apparel, and baby. As a result, we report WS-UDA’s performance in these three domains against the following benchmarks in order to establish a fair comparison:

DANN [41]: provides comparable latent features by adding numerous regular layers as well as a new gradient reversal layer to the model.
DataSel [61]: depends on choosing data based on the variance in complexity and domain similarity.
DistanceNet [59]: adds distance metrics as a loss function to be reduced alongside the total loss function.

Lastly, we compare our approach to the benchmarks listed below, which present their findings using the Spam data set:

LDADA [62]: employs on LDA for domain adaptation.
JDME-G [63]: is the Joint Distribution Matching Embedding (JDME) method for projecting samples onto a latent space.

5.3. Performance of WS-UDA in Multi-Source/Single-Target Setting

The performance of our proposed WS-UDA is compared to benchmark approaches for the Amazon and FDU-MTL data sets, as illustrated in Figure 3 and Figure 4, respectively. In each figure, the target domain is represented on the x-axis, while the label prediction accuracy for WS-UDA and comparison baselines is indicated on the y-axis.

For the Amazon data set, which comprises four domains, four separate transfer tasks are evaluated. For example, when the target unlabeled data is in the books domain, the source domains include DVD, electronics, and kitchen, resulting in the (DVD, electronics, kitchen) ⟶ books transfer task. This convention is consistent across all transfer tasks in the figures. Figure 3 demonstrates the performance using the Amazon data set. WS-UDA, with its weighted sequential architecture, consistently outperforms various baselines and the standalone MAN algorithm without sequential learning across all four transfer tasks. Compared to alternative baselines, WS-UDA improves the performance of individual transfer tasks by up to

3 %

and the overall performance of all four transfer tasks by

1.6 %

.

Similar observations can be made for the FDU-MTL data set, as shown in Figure 4. WS-UDA enhances the performance of individual transfer tasks by up to

6.5 %

and the overall performance of all sixteen transfer tasks by

2.8 %

. The significance lies in the effectiveness of the weighted sequential framework’s second stage, initialized by the shared feature extractor obtained from the first stage. This highlights WS-UDA’s potential to improve feature discriminability, leveraging the sequential approach to simplify the learning task and generalize better to the target domain. Figure 5 depicts the average similarity between different domains. Table 2 highlights the relationship between the performance gains achieved by WS-UDA and the average similarity across domains. As the average similarity decreases, WS-UDA exhibits notable improvements in adaptation performance, showcasing its effectiveness in scenarios where the target domain substantially differs from other domains in the dataset. This validates that weighted initialization has a positive impact on the adaptation goal.

5.4. Performance of WS-UDA in Single-Source/Single-Target Setting

The performance of our proposed WS-UDA compared to benchmark approaches for the Amazon, FDU-MTL, and Spam data sets is illustrated in Figure 6, Figure 7, Figure 8 and Figure 9, respectively.

In the Amazon data set, we have four domains, resulting in twelve separate transfer tasks. While there are six and two tasks for the FDU-MTL and Spam data sets, respectively. Figure 6 and Figure 7 illustrate the performance with the Amazon data set. As demonstrated, WS-UDA’s weighted sequential framework outperforms the baseline MAN algorithm without sequential learning and all other comparable baselines in the four transfer tasks. Compared to alternative baselines, WS-UDA enhances the performance of various individual transfer tasks by up to

5 %

and the overall performance of all twelve transfer tasks by

2.34 %

.

Similar improvements are shown with the WS-UDA approach using the FDU-MTL and Spam data sets, as shown, in Figure 8 and Figure 9, respectively. WS-UDA boosts the performance of individual transfer tasks by up to

4.1 %

for the FDU-MTL dataset and

2.88 %

for the Spam dataset. The average performance across all transfer tasks is

2.85 %

for the FDU-MTL dataset and

1.7 %

for the Spam dataset.

Based on the discussed findings regarding WS-UDA performance, it is important to note that different target domains exhibit varying rates of improvement. This variability is influenced by the similarity of these target domains to other source domains. To elucidate this relationship, we investigate the correlation between domain similarity and the performance of the WS-UDA algorithm. Figure 10 illustrates the average similarity between each domain and the other domains for the Amazon, FDU-MTL, and Spam datasets. For example, the correlation between the book domain and other domains in the Amazon dataset and the performance of the WS-UDA model is evident in Figure 10a. This observation aligns with the degree of improvement shown in Figure 6a, where the task DVD → books achieves the highest accuracy. In contrast, the task kitchen → books demonstrates limited performance gain, illustrating an inverse relationship between the performance of WS-UDA and the similarity between the source and target domains.

The average performance improvement achieved by the WS-UDA architecture, along with the average similarity of the domain to other domains, is summarized in Table 3. The normalized average cosine similarity for the three datasets is shown in Figure 11, demonstrating that WS-UDA can enhance adaptation performance the further apart the two domains are from each other. Thus, WS-UDA addresses the issue with modern UDA approaches, which require similarity between the two domains to avoid instability.

5.5. Performance of WS-UDA in Single-Source vs. Multi-Source Comparison

Table 4 compares the performance of WS-UDA in a single-source and multi-source setting. As previously outlined, there is a distinction between these two settings, that can be seen in the accuracy of the single-source scenario, which ranges from 70.9 to 80.6%, compared to the multi-source case, which spans from 76.8 to 87.5% across the two data sets. This is substantiated since using multiple sources enhances the variety of shared features, which thus improves the possibility of discovering features that are more relevant to the target domain. In other words, in the single-source/single-target scenario, the divergence between the labeled domain (one source domain) and the target domain is greater than the divergence between (many sources) and the target domain in the multi-source/single-target scenario. As a result, a target’s accuracy from several sources is higher than the accuracy of the same target from a single-source configuration. As was already mentioned, the further apart the two domains are from one another, WS-UDA can enhance adaption performance. It is obvious that this is the case here because the divergence is more in the single-source/single-target setting than it is in the multi-source/single-target setting. As a result, we can observe that our weighted sequential learning strategy is enhanced more in this situation, as shown in Table 4, than it is in the multi-source/single-target setting. With contemporary UDA techniques, which depend on the similarity between the two domains to prevent instability, WS-UDA overcomes the problem.

6. Ablation Study

The series of experiments conducted here are designed to substantiate the efficacy of the proposed WS-UDA framework’s two-stage training in preventing adaptation to specific features or characteristics inherent in the training data. The focus is on evaluating the framework’s ability to generalize robustly across diverse or previously unseen contexts, with a particular emphasis on the second stage of its learning. In these experiments, an unseen domain denoted as C is utilized to assess the representations acquired through a specific transfer task, specifically the A ⟶ B transfer task. Here, domain A serves as the labeled source domain, domain B represents the unlabeled target domain that is distinct in its learning of invariant features, and domain C is an unseen domain that the model encounters only during the testing phase.

The Without Sequential framework is specifically designed to extract invariant domain features between source domain A and target domain B. As detailed in the paper, this framework tends to generate insufficient feature representations, limiting optimal performance to scenarios where domains A and B share significant similarity. Consequently, achieving the best performance during the testing phase, whether evaluating with an unseen (C) or unlabeled (B) domain, hinges on a close relationship between the tested domain and the source domain A. On the contrary, according to the findings presented in this paper, the WS-UDA two-stage training method addresses this limitation of the Without Sequential framework. WS-UDA is designed to provide a more consistent feature representation specifically tailored to support the target domain. This assertion is substantiated by an overchasing analysis, revealing that WS-UDA’s performance improves as the similarity between the source and target domains decreases. As a result, we posit that this conclusion remains valid during testing, whether applied to the target domain (B) or extended to scenarios involving unseen (C) domain. Indeed, for unseen domain C, it can be concluded that the Without Sequential framework performs optimally when there is significant relevance between C and A. Conversely, the WS-UDA framework demonstrates greater enhancement when there is a higher degree of relevance between C and B. In essence, the performance of each method is influenced by the closeness of the respective domains in the source-target relationship, with Without Sequential excelling when A is closely related to C and WS-UDA excelling when B is closely related to C.

To reinforce our discussion, let us refer to Table 5, which delineates the outcomes in the context where C represents the apparel domain across two distinct transfer tasks; MR ⟶ baby and baby ⟶ MR. This table serves as a visual aid, presenting a tangible depiction of the performance metrics associated with the proposed the WS-UDA method in comparison to the Without Sequential framework. The first column in the table showcases the performance when one-stage learning is applied, while the second column details the performance achieved with the proposed WS-UDA method. The last column provides insights into the distance between the apparel test set and each source (A) and target (B) within the corresponding transfer task. It is worth noting that all distances have been normalized to enhance the reliability of comparisons. As evident, the performance of the Without Sequential framework appears to be linked to the distance between C and A. This observation is supported by the table, where the distance between apparel and the MR labeled set is notably smaller than that between apparel and the baby labeled set. Consequently, the performance of the apparel domain in the baby-MR transfer task is substantially higher (approximately 78.5%) than in the MR-baby transfer task (approximately 66%).

On the other hand, the WS-UDA model is anticipated to show performance improvement proportional to the distance between the apparel domain and the target of the corresponding transfer task. This expectation aligns well with the table’s results, indicating that apparel is notably more similar to the target (baby) compared to the target (MR). Notably, the improvement in performance observed in the MR ⟶ baby transfer task (2.75%) surpasses that in the baby ⟶ MR transfer task (1%). The consistent findings are evident in Table 6 for the remaining domains within the FDU-MTL dataset. Similarly, comparable conclusions can be drawn from Table 7 in the Amazon dataset, particularly when the unseen domain is C (electronics). It is worth noting that the same experimental trend is observed for other domains in the Amazon dataset, although these results are omitted here to optimize space.

These recurring observations across different datasets and domains reinforce the robustness and generalizability of the conclusions drawn from the analysis. The consistent performance trends support the argument that WS-UDA demonstrates effectiveness in learning domain-invariant features, indicating its potential to adapt to diverse or unseen contexts. This ability is crucial for models aiming to generalize well across different domains, making them more robust and applicable to a broader range of scenarios.

In essence, the model’s proficiency on domain C suggests its acquisition of domain-invariant features, enabling it to adeptly adapt to diverse or unfamiliar contexts.

7. Conclusions

In the context of unsupervised domain adaptation for sentiment analysis, our innovation, WS-UDA, introduces a distinctive sequential discriminative feature learning architecture. By employing a domain-adversarial learning model, WS-UDA excels in scenarios involving closely related source and target domains. However, challenges emerge in conventional unsupervised domain adaptation techniques when confronted with significant dissimilarities between source and target domains. This divergence can potentially lead to instability during shared-feature learning, limiting the model’s ability to ensure comprehensive coverage of relevant patterns and information present in the target domain.

To overcome these challenges, we present the WS-UDA framework, which addresses dissimilarity-related issues through the incorporation of weight initialization. This is achieved through two sequential steps designed to enhance model stability and adaptability. The sequential nature of this approach enriches the model’s adaptability to diverse domains and varying levels of dissimilarity between the source and target domains. WS-UDA thus offers a nuanced solution to the inherent challenges associated with adapting sentiment analysis models across dissimilar domains. WS-UDA has been evaluated using three benchmark datasets: Amazon reviews, FDU-MTL datasets, and Spam datasets. The evaluation includes both single-source/single-target and multi-source/single-target domain settings. The experimental results demonstrate the promising performance of WS-UDA, surpassing state-of-the-art cross-domain unsupervised baselines. Notably, the performance improvement achieved by WS-UDA is inversely related to the similarity between the source and target domains. Additionally, employing multiple source domains enhances the identification of target-relevant features, leading to improved performance as the distance between source and target datasets increases. While the experimental results of WS-UDA focus on sentiment analysis, the approach can be adapted to other text classification tasks or diverse target domains. However, the proposed system lacks a well-established hypothesis guiding the selection of hyperparameters and proper topology, common challenges in deep learning methodologies. As a result, the complexity may pose difficulties for less experienced individuals to embrace.

In concluding this study, several promising avenues for future research in the field of unsupervised domain adaptation (UDA) for text classification come to light. One such avenue involves investigating alternative model architectures and learning paradigms to enhance the robustness and generalizability of weakly supervised UDA. Additionally, extending the applicability of WS-UDA to a broader range of tasks, such as topic classification, intent recognition, presents an exciting prospect. Assessing the effectiveness of WS-UDA across different domains, including healthcare, or finance, will contribute to a more comprehensive understanding of its capabilities in various real-world scenarios. Another intriguing direction for future research is the exploration of combining WS-UDA with other UDA models to enhance overall adaptation performance. By delving into these research directions, we can advance the field of unsupervised domain adaptation for text classification and improve the performance of weakly supervised UDA methods. This exploration holds the promise of making UDA more effective and versatile across a wide array of applications and domains.

Author Contributions

Conceptualization, H.B. and N.W.; methodology, H.B.; software, H.B.; validation, H.B., N.W. and M.F.; formal analysis, H.B.; investigation, H.B.; writing—original draft preparation, H.B.; writing—review and editing, H.B. and N.W.; supervision, N.W. and M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available. Amazon data-set in [52], Fdu-MTL data-set in [53] and Spam data-set in [53].

Conflicts of Interest

The authors declare no conflict of interest.

References

Ramponi, A.; Plank, B. Neural Unsupervised Domain Adaptation in NLP—A Survey. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–11 December 2020; pp. 6838–6855. [Google Scholar] [CrossRef]
Alqahtani, Y.; Al-Twairesh, N.; Alsanad, A. A Comparative Study of Effective Domain Adaptation Approaches for Arabic Sentiment Classification. Appl. Sci. 2023, 13, 1387. [Google Scholar] [CrossRef]
Naeem, S.; Logofătu, D.; Muharemi, F. Sentiment Analysis by Using Supervised Machine Learning and Deep Learning Approaches. In Advances in Computational Collective Intelligence: 12th International Conference, ICCCI 2020, Da Nang, Vietnam, 30 November–3 December 2020; Hernes, M., Wojtkiewicz, K., Szczerbicki, E., Eds.; Springer: Cham, Switzerland, 2020; pp. 481–491. [Google Scholar]
Kouw, W.; Loog, M. A review of single-source unsupervised domain adaptation. arXiv 2019, arXiv:1901.05335. [Google Scholar]
Kong, Y.; Xu, Z.; Mei, M. Cross-Domain Sentiment Analysis Based on Feature Projection and Multi-Source Attention in IoT. Sensors 2023, 23, 7282. [Google Scholar] [CrossRef]
Mathapati, S.; Nafeesa, A.; Tanuja, R.; Manjula, S.; Venugopal, K. Semi-supervised domain adaptation and collaborative deep learning for dual sentiment analysis. SN Appl. Sci. 2019, 1, 907. [Google Scholar] [CrossRef]
Sharir, O.; Peleg, B.; Shoham, Y. The Cost of Training NLP Models: A Concise Overview. arXiv 2020, arXiv:2004.08900. [Google Scholar]
Motiian, S.; Piccirilli, M.; Adjeroh, D.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5715–5725. [Google Scholar]
Xiao, M.; Guo, Y. Semi-supervised subspace co-projection for multi-class heterogeneous domain adaptation. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 7–11 September 2015; pp. 525–540. [Google Scholar]
Das, D.; Lee, C.G. Sample-to-sample correspondence for unsupervised domain adaptation. Eng. Appl. Artif. Intell. 2018, 73, 80–91. [Google Scholar] [CrossRef]
Gong, R.; Li, W.; Chen, Y.; Gool, L.V. Dlow: Domain flow for adaptation and generalization. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2477–2486. [Google Scholar]
Tian, Q.; Zhou, J.; Chu, Y. Joint bi-adversarial learning for unsupervised domain adaptation. Knowl. Based Syst. 2022, 248, 108903. [Google Scholar] [CrossRef]
You, F.; Su, H.; Li, J.; Zhu, L.; Lu, K.; Yang, Y. Learning a weighted classifier for conditional domain adaptation. Knowl. Based Syst. 2021, 215, 106774. [Google Scholar] [CrossRef]
Liu, X.; Yoo, C.; Xing, F.; Oh, H.; El Fakhri, G.; Kang, J.W.; Woo, J. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Trans. Signal Inf. Process. 2022, 11, e25. [Google Scholar] [CrossRef]
Dai, Y.; Liu, J.; Ren, X.; Xu, Z. Adversarial Training Based Multi-Source Unsupervised Domain Adaptation for Sentiment Analysis. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 7618–7625. [Google Scholar]
Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
Zhao, H.; Zhang, S.; Wu, G.; Costeira, J.P.; Moura, J.M.; Gordon, G.J. Multiple Source Domain Adaptation with Adversarial Training of Neural Networks. arXiv 2017, arXiv:1705.09684. [Google Scholar]
Toldo, M.; Maracani, A.; Michieli, U.; Zanuttigh, P. Unsupervised Domain Adaptation in Semantic Segmentation: A Review. Technologies 2020, 8, 35. [Google Scholar] [CrossRef]
Jiang, J.; Zhai, C. Instance weighting for domain adaptation in NLP. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 25–27 June 2007; pp. 264–271. [Google Scholar]
Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised Visual Domain Adaptation Using Subspace Alignment. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2960–2967. [Google Scholar] [CrossRef]
Zhuang, F.; Luo, P.; Yin, P.; He, Q.; Shi, Z. Concept learning for cross-domain text classification: A general probabilistic framework. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 1960–1966. [Google Scholar]
Ben-David, E.; Rabinovitz, C.; Reichart, R. Perl: Pivot-based domain adaptation for pre-trained deep contextualized embedding models. Trans. Assoc. Comput. Linguist. 2020, 8, 504–521. [Google Scholar] [CrossRef]
Yang, S.; Yu, K.; Cao, F.; Liu, L.; Wang, H.; Li, J. Learning causal representations for robust domain adaptation. IEEE Trans. Knowl. Data Eng. 2023, 35, 2750–2764. [Google Scholar] [CrossRef]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the 27th Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
Chen, M.; Xu, Z.; Weinberger, K.Q.; Sha, F. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, 26 June–1 July 2012; pp. 1627–1634. [Google Scholar]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 1096–1103. [Google Scholar]
Redko, I.; Morvant, E.; Habrard, A.; Sebban, M.; Bennani, Y. A survey on domain adaptation theory: Learning bounds and theoretical guarantees. arXiv 2020, arXiv:2004.11829. [Google Scholar]
Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A brief review of domain adaptation. In Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar]
Zhuang, F.; Cheng, X.; Luo, P.; Pan, S.J.; He, Q. Supervised representation learning: Transfer learning with deep autoencoders. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 4119–4125. [Google Scholar]
Zhuang, F.; Cheng, X.; Luo, P.; Pan, S.J.; He, Q. Supervised representation learning with double encoding-layer autoencoder for transfer learning. ACM Trans. Intell. Syst. Technol. 2017, 9, 1–17. [Google Scholar] [CrossRef]
Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning transferable features with deep adaptation networks. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 97–105. [Google Scholar]
Zhang, X.; Yu, F.X.; Chang, S.F.; Wang, S. Deep transfer network: Unsupervised domain adaptation. arXiv 2015, arXiv:1503.00591. [Google Scholar]
Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
Glorot, X.; Bordes, A.; Bengio, Y. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; pp. 513–520. [Google Scholar]
Guo, H.; Pasunuru, R.; Bansal, M. Multi-source domain adaptation for text classification via distancenet-bandits. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 7830–7838. [Google Scholar]
Bousmalis, K.; Trigeorgis, G.; Silberman, N.; Krishnan, D.; Erhan, D. Domain separation networks. In Proceedings of the 30th Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 343–351. [Google Scholar]
Li, Z.; Zhang, Y.; Wei, Y.; Wu, Y.; Yang, Q. End-to-end adversarial memory network for cross-domain sentiment classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2237–2243. [Google Scholar]
Zhang, Y.; Liu, T.; Long, M.; Jordan, M. Bridging theory and algorithm for domain adaptation. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 7404–7413. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Wang, Y.; Huang, J.; Shang, J.; Niu, C.; Zhou, Z. Domain Invariant and Class Discriminative Heterogeneous Domain Adaptation. In Proceedings of the 2018 IEEE 3rd International Conference on Communication and Information Systems (ICCIS), Singapore, 28–30 December 2018; pp. 227–231. [Google Scholar]
Zhou, J.T.; Zhao, H.; Peng, X.; Fang, M.; Qin, Z.; Goh, R.S.M. Transfer hashing: From shallow to deep. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6191–6201. [Google Scholar] [CrossRef]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Chen, X.; Cardie, C. Multinomial Adversarial Networks for Multi-Domain Text Classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LO, USA, 1–6 June 2018; pp. 1226–1240. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 27th Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1180–1189. [Google Scholar]
Chen, M.; Zhao, S.; Liu, H.; Cai, D. Adversarial-learned loss for domain adaptation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3521–3528. [Google Scholar]
Shen, J.; Qu, Y.; Zhang, W.; Yu, Y. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; pp. 4058–4065. [Google Scholar]
Han, X.; Eisenstein, J. Unsupervised domain adaptation of contextualized embeddings for sequence labeling. arXiv 2019, arXiv:1904.02817. [Google Scholar]
Cai, G.; Wang, Y.; He, L.; Zhou, M. Unsupervised domain adaptation with adversarial residual transform networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3073–3086. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.S.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
Chen, X.; Hassan, A.; Hassan, H.; Wang, W.; Cardie, C. Multi-Source Cross-Lingual Model Transfer: Learning What to Share. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 30 July–4 August 2019; pp. 3098–3112. [Google Scholar]
Blitzer, J.; Dredze, M.; Pereira, F. Domain adaptation for sentiment classification. In Proceedings of the 45th Anniversary Meeting of the Association Computational Linguistics (ACL’07), Prague, Czech Republic, 23–30 June 2007; pp. 440–447. [Google Scholar]
Liu, P.; Qiu, X.; Huang, X.J. Adversarial Multi-task Learning for Text Classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1–10. [Google Scholar]
Kouw, W.M.; Van Der Maaten, L.J.; Krijthe, J.H.; Loog, M. Feature-level domain adaptation. J. Mach. Learn. Res. 2016, 17, 5943–5974. [Google Scholar]
Almeida, T.A.; Hidalgo, J.M.G.; Yamakami, A. Contributions to the study of SMS spam filtering: New collection and results. In Proceedings of the 11th ACM Symposium on Document Engineering, Mountain View, CA, USA, 19–22 September 2011; pp. 259–262. [Google Scholar]
Klimt, B.; Yang, Y. Introducing the Enron corpus. In Proceedings of the Conference on Email and Anti-Spam, Mountain View, CA, USA, 30–31 July 2004. [Google Scholar]
Wu, Y.; Guo, Y. Dual adversarial co-learning for multi-domain text classification. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 6438–6445. [Google Scholar]
Chen, J.; Qiu, X.; Liu, P.; Huang, X. Meta multi-task learning for sequence modeling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; pp. 5070–5077. [Google Scholar]
Wei, P.; Ke, Y.; Goh, C.K. Feature analysis of marginalized stacked denoising autoenconder for unsupervised domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1321–1334. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Jin, Y.; Long, M.; Wang, J.; Jordan, M.I. Transferable normalization: Towards improving transferability of deep neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 1953–1963. [Google Scholar]
Remus, R. Domain adaptation using domain similarity-and domain complexity-based instance selection for cross-domain sentiment analysis. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 10 December 2012; pp. 717–723. [Google Scholar]
Lu, H.; Shen, C.; Cao, Z.; Xiao, Y.; van den Hengel, A. An embarrassingly simple approach to visual domain adaptation. IEEE Trans. Image Process. 2018, 27, 3403–3417. [Google Scholar] [CrossRef]
Jin, X.; Yang, X.; Fu, B.; Chen, S. Joint distribution matching embedding for unsupervised domain adaptation. Neurocomputing 2020, 412, 115–128. [Google Scholar] [CrossRef]

Figure 1. Taxonomy of domain adaptation approaches.

Figure 2. WS-UDA’s architecture involves two steps. In Step 1, an adversarial-based UDA is trained to acquire the weights for the shared-feature extractor. In Step 2, the shared-feature extractor is more purposefully trained, starting with the initialization from Step 1.

Figure 3. Target domain accuracy in the Amazon data set. The multi-source/single-target setting. The target domain is shown on the x-axis.

Figure 4. Target domain accuracy in the FDU-MTL data set. The multi-source/single-target setting. The target domain is shown on the x-axis.

Figure 5. Average similarity between different domains. Different target domains are represented on the x-axis. The y-axis displays the average similarity between the labeled samples of multi-source domains and the unlabeled samples from the target domain.

Figure 6. Target domain accuracy in Amazon data set where the target domain is (a) books and (b) electronics. The single-source/single-target setting.

Figure 7. Target domain accuracy in Amazon data set where the target domain is (a) DVD and (b) kitchen. The single-source/single-target setting.

Figure 8. Target domain accuracy in the FDU-MTL data set. The single-source/single-target setting.

Figure 9. Target domain accuracy in the Spam data set. The single-source/single-target setting.

Figure 10. Average cosine similarity between different domains. Different target domains are represented on the x-axis. The y-axis displays the average cosine similarity between the labeled samples from various source domains and the unlabeled samples from the target domain.

Figure 11. The average performance improvement achieved by the WS-UDA architecture, along with the the normalized average cosine similarity for the three data sets.

Table 1. WS-UDA hyperparameters.

WS-UDA	Step 1	Step 2
Learning_rate	0.0001	0.001
Drop out	0.4	0.3
Activation Function	ReLU
Optimizer	Adam
C	128 + 64
D Size	64
K	5

Table 2. WS-UDA performance gains vs. average similarity.

	Target Domain	Improvement%	Avg. Similarity
Amazon	Kitchen	0.23%	0.9
	Electronics	0.87%	0.74
	DVD	1.21%	0.29
	Books	3.14%	0.2
FDU-MTL	Music	0.29%	0.90
	Books	0.73%	0.87
	Kitchen	0.79%	0.86
	Electronics	1.13%	0.76
	DVD	1.16%	0.76
	Camera	1.22%	0.69
	Apparel	1.23%	0.69
	Magazines	1.47%	0.67
	Health	2.30%	0.57
	Video	2.88%	0.52
	Toys	3.20%	0.47
	Software	4.17%	0.37
	MR	5.86%	0.12
	Sports	6.44%	0.30
	IMDb	6.93%	0.28
	Baby	7.10%	0.15

Table 3. WS-UDA performance gains vs. average similarity.

	Target Domain	Improvement%	Avg. Similarity
Amazon	Books	2.94%	0.20
	DVD	2.71%	0.29
	Electronics	2.24%	0.74
	Kitchen	1.46%	0.90
FDU-MTL	MR	3.64%	0.470
	Baby	2.74%	0.525
	Apparel	2.27%	0.545
Spam	Enron	2.34%	0.462
Spam	Spam	1.03%	0.632

Table 4. The accuracy % of the single-source/single-target versus multi-source/single-target implementation of our suggested WS-UDA. The “Avg. enhancement row” additionally showed the average improvement from applying the suggested WS-UDA over the baseline alone without sequential learning.

Target	Single-Source	Multi-Source
(a) Amazon data set
Books	74.5%	80.22%
DVD	76.5%	83.5%
Electronics	79.9%	85.7%
Kitchen	80.6%	87.5%
Avg. enhancement	$2.34 %$	$1.6 %$
(b) FDU-MTL data set
MR	70.9%	76.8%
Baby	77.3%	86.3%
Apparel	76.9%	86.8%
Avg. enhancement	$2.9 %$	$2.3 %$

Table 5. Accuracy percentages when tested on the apparel domain. The first row demonstrates performance when tested on apparel, trained with MR as the source and baby as the target. The second row presents similar results for the transfer task baby-MR.

A ⟶ B	Without Sequential	WS-UDA	Distance with Unseen C Domain and:
MR ⟶ Baby	65.75%	68.5%	A (MR)	B (Baby)
			0.384	0.9
Baby ⟶ MR	78.5%	79.5%	A (Baby)	B (MR)
			0.687	0.382
C is apparel domain.

Table 6. Accuracy percentages assessed for unseen domain as (a) the MR domain and (b) the baby domain across various transfer tasks.

A ⟶ B	Without Sequential	WS-UDA	Distance with Unseen C Domain and:
Baby ⟶ Apparel	67.0%	68.25%	A (Baby)	B (Apparel)
			0.323	0.382
Apparel ⟶ Baby	$65.75 %$	66.25%	A (Apparel)	B (Baby)
			0.2	0.323
(a) C is MR domain
Apparel ⟶ MR	81.5%	83.3%	A (Apparel)	B (MR)
			0.767	0.323
MR ⟶ Apparel	68.5%	71.0%	A (MR)	B (Apparel)
			0.585	0.9
(b) C is baby domain

Table 7. Accuracy percentages assessed when tested on the electronics domain, exemplified in the Amazon dataset across various transfer tasks.

A ⟶ B	Without Sequential	WS-UDA	Distance with Unseen C and:
Dvd ⟶ Books	75.6%	78.3%	A (Dvd)	B (Books)
			0.214	0.816
Dvd ⟶ Kitchen	72.3%	73.6%	A (Dvd)	B (Kitchen)
			0.214	0.2
Kitchen ⟶ Dvd	82.3%	84.3%	A (Kitchen)	B (Dvd)
			0.9	0.657
Kitchen ⟶ Books	82.8%	84.6%	A (Kitchen)	B (Books)
			0.9	0.816
Books ⟶ Dvd	71.3%	74.19%	A (Books)	B (Dvd)
			0.442	0.657
Books ⟶ Kitchen	72.9%	73.2%	A (Books)	B (Kitchen)
			0.442	0.2
C is electronics domain.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Badr, H.; Wanas, N.; Fayek, M. Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis. Appl. Sci. 2024, 14, 406. https://doi.org/10.3390/app14010406

AMA Style

Badr H, Wanas N, Fayek M. Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis. Applied Sciences. 2024; 14(1):406. https://doi.org/10.3390/app14010406

Chicago/Turabian Style

Badr, Haidi, Nayer Wanas, and Magda Fayek. 2024. "Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis" Applied Sciences 14, no. 1: 406. https://doi.org/10.3390/app14010406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis

Abstract

1. Introduction

2. Related Works

2.1. Shallow Domain Adaptation Approaches

2.2. Deep Domain Adaptation Approaches

2.2.1. Traditional Deep Domain Adaptation

2.2.2. Adversarial Deep Domain Adaptation

3. Motivation

4. Weighted Sequential Unsupervised Domain Adaption Framework

4.1. WS-UDA Step 1

4.2. WS-UDA Step 2

5. Experiments and Evaluation

5.1. Data Sets

5.2. Comparison Baselines

5.3. Performance of WS-UDA in Multi-Source/Single-Target Setting

5.4. Performance of WS-UDA in Single-Source/Single-Target Setting

5.5. Performance of WS-UDA in Single-Source vs. Multi-Source Comparison

6. Ablation Study

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI