Blank Language Models

- 9 mins

Authors : Tianxiao Shen, Victor Quach, Regina Barzilay, Tommi Jaakkola MIT Computer Science & Artificial Intelligence Laboratory (MIT CSAIL)

Paper : https://arxiv.org/pdf/2002.03079.pdf
Code : TBA


Summary

Dynamically creating and filling in blanks task에 적합한 Blank Language Model 소개

Language model 에서는 약하지만, 부분을 채우는 task에는 좋은 성능을 보임


Abstract

We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. Unlike previous masked language models (Devlin et al., 2018) or the Insertion Transformer (Stern et al., 2019), BLM uses blanks to control which part of the sequence to expand. This fine-grained control of generation is ideal for a variety of text editing and rewriting tasks. The model can start from a single blank or partially completed text with blanks at specified locations. It iteratively determines which word to place in a blank and whether to insert new blanks, and stops generating when no blanks are left to fill. BLM can be efficiently trained using a lower bound of the marginal data likelihood, and achieves perplexity comparable to traditional left-to-right language models on the Penn Treebank and WikiText datasets. On the task of filling missing text snippets, BLM significantly outperforms all other baselines in terms of both accuracy and fluency. Experiments on style transfer and damaged ancient text restoration demonstrate the potential of this framework for a wide range of applications.

1. Introduction

3. Blank Language Models

A blank language model (BLM) generates sequences by creating and filling in blanks. Generation starts with a single blank and ends when there is no blank. In each step, the model selects a blank “__”, predicts a word w, and fills the blank with “w”, “_w”, “w_”, or “_w_”. In this way, a blank can be expanded to any number of words.

\[\text{BLM(c)} = p(b,w,l,r|c;\theta)\]

각 Blank는 nonterminal symbol 이면서도, start symbol이며, terminal symbols은 vocabulary V 로 부터 나온다 (Blank가 없을 때).

생성 규칙은 다음과 같이 제한 됨.

“__” → “__?w__?”, for w in V.

? 은 옵션임을 나타냄. 즉, “w”, “__w”, “w__”, or “__w__”. 4가지 중 1개.

3.1 Model Architecture

3.2 Likelihood

The same final text “x” may be realized by multiple trajectories. However, if we specify the order in which the words in “x” are generated, the trajectory is also uniquely determined. This follows from the fact that BLM never results in a canvas with two (or more) consecutive blanks.

Trajectories 와 generation orders 사이의 일치는 다음과 같이 marginal likelihood를 정의할 수 있다.

\[p(x;\theta)=\sum\limits_{\sigma\in{S_n}}p(x,\sigma;\theta)=\sum\limits_{\sigma\in{S_n}}\prod\limits_{t=0}^{n-1}p(a_t^{x,\sigma}|c_t^{x,\sigma};\theta)\]

3.3 Training

? : Here we propose training objectives derived from log likelihood. Directly computing the marginal likelihood over n! orders is intractable. We use Jensen’s inequality to lower bound the log likelihood:

where equality holds when the posterior p(σ x; θ) is uniform.

To train more efficiently, we note that the canvas \(c_t^{x,\sigma}\) depends only on the first t elements of σ. Hence we can combine loss calculations of trajectories that are the same in the first t steps but different at the t + 1 step. Switching the summation order of σ and t, we have:

4. Experiments

Experimental Details

We note that beam search in BLM does not search for the sentence with the maximum marginal likelihood p(x; θ), but instead for a sentence and a trajectory that have the maximum joint likelihood p(x, σ; θ).

4.1. Language Modeling

4.2. Text Infilling

Text infilling task는 텍스트의 일부분을 가지고 나머지 부분을 채우는 task (Figure 4. 에서 확인 가능)

Context와 semantic consistency를 유지하면서 공백을 채우는 능력을 평가함.

4.3. Ancient Text Restoration

Ancient text restoration은 ancient documents의 일부를 복원하는 text infilling task.

Character-level 이기 때문에, “?” symbol로 character 수를 알려 줌.

이러한 task를 위해 Length-aware BLM 사용.

4.4. Sentiment Transfer

Sentiment transfer task는 topic을 유지하면서 문장의 sentiment를 바꾸는 것.

Style transfer의 two-step approach 채택

  1. Remove words and expressions of high polarity from the source sentence;
  2. Complete the partial sentence with words and expressions of the target sentiment.

Step 1 에서는 frequency-ratio, attention scores 베이스 방법 2가지가 존재함.

Step 2 에서는 model을 통해서 시행

sentiment가 2개일 때는 각 sentiment 마다 BLM을 학습 시킴

5. Conclusion

In this paper, we proposed the blank language model for flexible text generation. BLMs can generate sequences in different orders by dynamically creating and filling in blanks. We demonstrate the effectiveness of our method on various text rewriting tasks, including text infilling, ancient text restoration and style transfer. Future work may explore sequence modeling tasks beyond text rewriting that also benefit from flexible generation order. An example is music modeling: harmonic constraints naturally impose a canvas that composers fill in with the melody.

Dongju Park

Dongju Park

Research Scientist / Engineer @ NAVER CLOVA

comments powered by Disqus
rss facebook twitter github gitlab googlescholar youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora