Hi everyone, in the previous parts we preprocessed our dataset and then built and trained our transformer model for English to Spanish translation. Now it’s time to test it out. The process of using a trained model to generate predictions is known as inference.
Before we begin lets study the theory behind generating the output sequence. In the last part we saw that our transformer is a Seq-to-Seq model which consist of encoder and deocder. And while training we saw that in order to generate the translation the decoder needs 2 types of input, first the actual source language at that perticular timestamp and the target language of the previous timestamp. This way we made the model learn to predict the next word, instead of just learning to map the source and target words.
So in inference we will do the same process iteratively, which is we will pass the English sentence and the [SOS] token to the decoder, and then as the decoder will start to generate the output we will keep on passing it to the decoder for further generation.
def translate(english_sentence):
"""
Translates an English sentence to Spanish using the trained Transformer model.
"""
# Preprocess the input sentence
tokenized_english = eng_sp.encode(english_sentence.lower())
encoder_input = tf.constant([tokenized_english], dtype=tf.int64)
# The decoder's input starts with the BOS token
decoder_input = [esp_sp.bos_id()]
output = tf.constant([decoder_input], dtype=tf.int64)
for i in range(config.MAX_LENGTH):
# Make a prediction
predictions = transformer([encoder_input, output], training=False)
# Select the last token from the seq_len dimension
predictions = predictions[:, -1:, :] # (batch_size, 1, vocab_size)
# Get the token with the highest probability (greedy search)
# This returns a tensor with dtype=tf.int64.
predicted_id = tf.argmax(predictions, axis=-1)
# Append the predicted token to the output.
output = tf.concat([output, predicted_id], axis=-1)
# Return the result if the EOS token is predicted
if predicted_id == esp_sp.eos_id():
break
# Decode the sequence of token IDs back to a text string
predicted_sentence = esp_sp.decode(output.numpy().flatten().tolist())
return predicted_sentence
In this code, we first convert our English input sentence into tokens. For the first decoding step, we provide the [SOS] (start-of-sequence) token along with the encoded representation of the complete English sentence. The model then predicts the next token in the target (Spanish) sequence not the actual word, just its token ID.
In each subsequent step, we feed the decoder with the tokens it has generated so far together with the encoded English sentence, allowing it to predict the next token. This process repeats until the model outputs the [EOS] (end-of-sequence) token.
Once the [EOS] token is produced, we take all the predicted tokens and decode them back into words to form the final Spanish translation.
In the previous section, we implemented the Greedy Search technique, where we selected the token with the highest probability at each decoding step. However, this approach can sometimes lead to suboptimal translations, since the locally most probable token might not result in the best overall sequence.
To address this limitation, we can consider multiple possible sequences instead of just one. The idea is to explore several high-probability paths simultaneously and keep track of those that lead to the lowest overall loss (or highest total probability) in the long run.
In simpler terms, at each decoding step, instead of picking only the single best token, we look at the top k most probable tokens. For each of these, we then generate their next k possible continuations, evaluate their cumulative probabilities (or losses), and keep the top k best-performing sequences. We continue this process until an [EOS] token is produced.
This approach is known as Beam Search, a refined version of the Best-First Search algorithm from heuristic search methods.
def translate(english_sentence, beam_width=3):
"""
Translates an English sentence to Spanish using the trained Transformer model
with beam search.
"""
# Preprocess the input sentence
tokenized_english = eng_sp.encode(english_sentence.lower())
encoder_input = tf.constant([tokenized_english], dtype=tf.int64)
# The decoder's input starts with the BOS token
start_token = esp_sp.bos_id()
end_token = esp_sp.eos_id()
# Initialize the beam with a Python list of lists.
initial_beam = [([start_token], 0.0)]
completed_hypotheses = []
for _ in range(config.MAX_LENGTH):
new_beam = []
for seq, score in initial_beam:
if seq[-1] == end_token:
completed_hypotheses.append((seq, score))
continue
decoder_input = tf.constant([seq], dtype=tf.int64)
predictions = transformer([encoder_input, decoder_input], training=False)
# Get the log probabilities of the next possible tokens
last_token_probs = predictions[:, -1, :]
log_probs = tf.math.log(last_token_probs)
# Get the top k most likely next tokens
top_k_log_probs, top_k_indices = tf.nn.top_k(log_probs, k=beam_width)
for i in range(beam_width):
new_token = top_k_indices[0, i]
new_log_prob = top_k_log_probs[0, i].numpy()
# FIX: Convert the new_token to a native Python integer using .item()
new_seq = seq + [new_token.numpy().item()]
new_score = score + new_log_prob
new_beam.append((new_seq, new_score))
# If all beams have ended in EOS, we can stop early
if not new_beam:
break
# Sort all new possible hypotheses by their score and keep the top k
initial_beam = sorted(new_beam, key=lambda x: x[1] / len(x[0]), reverse=True)[:beam_width]
# Add any remaining hypotheses from the beam to the completed list
completed_hypotheses.extend(initial_beam)
# Find the best translation among the completed hypotheses
if not completed_hypotheses:
return ""
best_hypothesis = sorted(completed_hypotheses, key=lambda x: x[1] / len(x[0]), reverse=True)[0]
best_seq = best_hypothesis[0]
# Decode the sequence of token IDs back to a text string
predicted_sentence = esp_sp.decode(best_seq)
return predicted_sentence
Now its time to test our model on some sentences. Right now we will be using some samples from the validation set only so that we can actually test if the translations are exact same or close enough or completely differnet.
sample_index = 89
english_sentence = valid_df.iloc[sample_index]['english']
reference_hindi = valid_df.iloc[sample_index]['spanish']
# Use the translate function to get the model's prediction
predicted_hindi = translate(english_sentence)
print(f"English Input: {english_sentence}")
print(f"Reference Spanish: {reference_hindi}")
print(f"Predicted Spanish: {predicted_hindi}")
With this, we’ve reached the end of our series. Throughout this series, we built an English-to-Spanish translation system entirely from scratch — beginning with dataset preparation, moving through model design and training, and finally exploring inference strategies such as Greedy Search and Beam Search.
This end-to-end process not only demonstrated how transformer-based architectures can be applied to sequence-to-sequence translation tasks, but also highlighted the importance of careful experimentation and evaluation at each stage.
As always, the field of natural language processing continues to evolve rapidly. Keep exploring, stay curious, and continue building on these foundations to push your understanding and your models even further.