Transformer Decoder

class textbox.module.Decoder.transformer_decoder.TransformerDecoder(embedding_size, ffn_size, num_dec_layers, num_heads, attn_dropout_ratio=0.0, attn_weight_dropout_ratio=0.0, ffn_dropout_ratio=0.0, with_external=True)[source]

Bases: Module

The stacked Transformer decoder layers.

forward(x, kv=None, self_padding_mask=None, self_attn_mask=None, external_states=None, external_padding_mask=None)[source]

Implement the decoding process step by step.

Parameters
  • x (Torch.Tensor) – target sequence embedding, shape: [batch_size, sequence_length, embedding_size].

  • kv (Torch.Tensor) – the cached history latent vector, shape: [batch_size, sequence_length, embedding_size], default: None.

  • self_padding_mask (Torch.Tensor) – padding mask of target sequence, shape: [batch_size, sequence_length], default: None.

  • self_attn_mask (Torch.Tensor) – diagonal attention mask matrix of target sequence, shape: [batch_size, sequence_length, sequence_length], default: None.

  • external_states (Torch.Tensor) – output features of encoder, shape: [batch_size, sequence_length, feature_size], default: None.

  • external_padding_mask (Torch.Tensor) – padding mask of source sequence, shape: [batch_size, sequence_length], default: None.

Returns

output features, shape: [batch_size, sequence_length, ffn_size].

Return type

Torch.Tensor

training: bool