Data Augmentation using Pre-trained Transformer Models

- 9 mins

Authors : Varun Kumar, Ashutosh Choudhary, Eunah Cho
Alexa AI (amazon)
Arxiv
Paper : https://arxiv.org/pdf/2004.12239.pdf
Code : https://github.com/varinf/TransformersDataAugmentation (TBD)


Summary

개인적 견해


Abstract

Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as autoregressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pretrained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information

1. Introduction

2. DA using Pre-trained Models

DA Problem formulation

2.1. Conditional DA using Pre-trained LM

2.1.1. Fine-tuning and generation using AE LMs

2.1.2. Fine-tuning and generation using AR

2.2. Conditional DA using Pre-trained Seq2Seq model

2.2.1. Fine-tuning and generation using Seq2Seq BART

2.3 Pre-trained Model Implementation

2.3.1. BERT based models

2.3.2. GPT2 model implementation

2.3.3. BART model implementation

3. Experimental setup

3.1. Baseline Approaches for DA

3.2. Data Sets

3.2.1. Low-resource data scenario

3.3. Evaluation

3.3.1. Classifiers for intrinsic evaluation

4. Results and Discussion

4.1. Generation by Conditioning on Labels

4.2. Pre-trained Model Comparison

Generated Data Fidelity

Generated Data Diversity

4.3. Guidelines For Using Different Types Of Pre-trained Models For DA

5. Conclusion And Future Work

We show that AE, AR, and Seq2Seq pre-trained models can be conditioned on labels by prepending label information and provide an effective way to augment training data. These DA methods can be easily combined with other advances in text content manipulation such as co-training the data generator and classifier (Hu et al., 2019). We hope that unifying different DA methods would inspire new approaches for universal NLP data augmentation.


Dongju Park

Dongju Park

Research Scientist / Engineer @ NAVER CLOVA

comments powered by Disqus
rss facebook twitter github gitlab googlescholar youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora