textbox.module.Attention

Attention Layers

class textbox.module.Attention.attention_mechanism.BahdanauAttention(source_size, target_size)[source]

Bases: Module

Bahdanau Attention is proposed in the following paper:

Neural Machine Translation by Jointly Learning to Align and Translate.

Reference:

https://arxiv.org/abs/1409.0473

forward(hidden_states, encoder_outputs, encoder_masks)[source]

Bahdanau attention

Parameters
  • hidden_states – shape: [batch_size, tgt_len, target_size]

  • encoder_outputs – shape: [batch_size, src_len, source_size]

  • encoder_masks – shape: [batch_size, src_len]

Returns

  • context: shape: [batch_size, tgt_len, source_size]

  • probs: shape: [batch_size, tgt_len, src_len]

Return type

tuple

score(hidden_states, encoder_outputs)[source]

Calculate the attention scores between encoder outputs and decoder states.

training: bool
class textbox.module.Attention.attention_mechanism.LuongAttention(source_size, target_size, alignment_method='concat', is_coverage=False)[source]

Bases: Module

Luong Attention is proposed in the following paper: Effective Approaches to Attention-based Neural Machine Translation.

Reference:

https://arxiv.org/abs/1508.04025

forward(hidden_states, encoder_outputs, encoder_masks, coverages=None)[source]

Luong attention

Parameters
  • hidden_states – shape: [batch_size, tgt_len, target_size]

  • encoder_outputs – shape: [batch_size, src_len, source_size]

  • encoder_masks – shape: [batch_size, src_len]

Returns

  • context: shape: [batch_size, tgt_len, source_size]

  • probs: shape: [batch_size, tgt_len, src_len]

Return type

tuple

score(hidden_states, encoder_outputs, coverages=None)[source]

Calculate the attention scores between encoder outputs and decoder states.

training: bool
class textbox.module.Attention.attention_mechanism.MonotonicAttention(source_size, target_size, init_r=- 4)[source]

Bases: Module

Monotonic Attention is proposed in the following paper:

Online and Linear-Time Attention by Enforcing Monotonic Alignments.

Reference:

https://arxiv.org/abs/1704.00784

exclusive_cumprod(x)[source]

Exclusive cumulative product [a, b, c] => [1, a, a * b]

gaussian_noise(*size)[source]

Additive gaussian nosie to encourage discreteness

hard(hidden_states, encoder_outputs, encoder_masks, previous_probs=None)[source]

Hard monotonic attention (Test)

Parameters
  • hidden_states – shape: [batch_size, tgt_len, target_size]

  • encoder_outputs – shape: [batch_size, src_len, source_size]

  • encoder_masks – shape: [batch_size, src_len]

  • previous_probs – shape: [batch_size, tgt_len, src_len]

Returns

  • context: shape: [batch_size, tgt_len, source_size]

  • probs: shape: [batch_size, tgt_len, src_len]

Return type

tuple

safe_cumprod(x)[source]

Numerically stable cumulative product by cumulative sum in log-space

score(hidden_states, encoder_outputs)[source]

Calculate the attention scores between encoder outputs and decoder states.

soft(hidden_states, encoder_outputs, encoder_masks, previous_probs=None)[source]

Soft monotonic attention (Train)

Parameters
  • hidden_states – shape: [batch_size, tgt_len, target_size]

  • encoder_outputs – shape: [batch_size, src_len, source_size]

  • encoder_masks – shape: [batch_size, src_len]

  • previous_probs – shape: [batch_size, tgt_len, src_len]

Returns

  • context: shape: [batch_size, tgt_len, source_size]

  • probs: shape: [batch_size, tgt_len, src_len]

Return type

tuple

training: bool
class textbox.module.Attention.attention_mechanism.MultiHeadAttention(embedding_size, num_heads, attn_weight_dropout_ratio=0.0, return_distribute=False)[source]

Bases: Module

Multi-head Attention is proposed in the following paper:

Attention Is All You Need.

Reference:

https://arxiv.org/abs/1706.03762

forward(query, key, value, key_padding_mask=None, attn_mask=None)[source]

Multi-head attention

Parameters
  • query – shape: [batch_size, tgt_len, embedding_size]

  • value (key and) – shape: [batch_size, src_len, embedding_size]

  • key_padding_mask – shape: [batch_size, src_len]

  • attn_mask – shape: [batch_size, tgt_len, src_len]

Returns

  • attn_repre: shape: [batch_size, tgt_len, embedding_size]

  • attn_weights: shape: [batch_size, tgt_len, src_len]

Return type

tuple

reset_parameters()[source]
training: bool
class textbox.module.Attention.attention_mechanism.SelfAttentionMask(init_size=100)[source]

Bases: Module

forward(size)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static get_mask(size)[source]
training: bool