TEXT EMBEDDING AUGMENTATION BASED ON RETRAINING WITH PSEUDO-LABELED ADVERSARIAL EMBEDDING

Text Embedding Augmentation Based on Retraining With Pseudo-Labeled Adversarial Embedding

Text Embedding Augmentation Based on Retraining With Pseudo-Labeled Adversarial Embedding

Blog Article

Pre-trained language models (LMs) have been shown to achieve outstanding performance in various natural language processing tasks; however, these models have a significantly large number of parameters to handle large-scale text corpora during the pre-training process, and thus, they entail the risk of mariya b vol 3 overfitting when fine-tuning for small task-oriented datasets is conducted.In this paper, we propose a text embedding augmentation method to prevent such overfitting.The proposed method applies augmentation to a text embedding by generating an adversarial embedding, which is not identical to original input embedding but maintaining the characteristics of the original input embedding, using PGD-based adversarial training for input text data.A pseudo-label that is identical to the label of the input text is then assigned to adversarial embedding to conduct retraining by using adversarial embedding and pseudo-label as input embedding and label pair for a separate LM.

Experimental results on caruso rhodiola several text classification benchmark datasets demonstrated that the proposed method effectively prevented overfitting, which commonly occurs when adjusting a large-scale pre-trained LM to a specific task.

Report this page