Artificial Intelligence and Machine Learning Articles

Get In Touch For Details! Request More Information

Name

Email ID

Phone Number

Education Qualification

Current Profile

Select your interested program

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Step-by-Step Guide to Building QA Systems Using Transformers

By Roopa Dharshini

Jul 01, 2025 3 Min Read 213 Views

(Last Updated)

In this blog, we are going to understand a step-by-step guide on building QA systems using transformers.

The question and answer system using transformers is commonly used in the field of natural language processing, and it has a wide range of applications, and it gives answers to the questions in the form of human human-understandable.

The question and answer system using transformers can be implemented in several methods and mechanisms. We will discuss each mechanism in detail. Let’s get started!

RNN and LSTM
Long Live Transformers

Five steps to understand the mechanism of transformers
What is BERT?
Why BERT?
What are Longformers?

Steps in Building QA Systems Using Transformers

CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:

CONCLUSION

RNN and LSTM

Recurrent Neural Network works with a generalization of a feedforward Neural network that has an internal memory. RNN uses its internal state (memory) to process a sequence of inputs.

Long-term short-term memory (LSTM) networks are modified versions of recurrent neural networks, making it easier to remember past data in memory. The vanishing gradient problem is resolved here. LSTM is well-suited to classify, process, and predict time series with time lags of unknown duration. It trains the model by using back propagation.

Recurrent neural networks and long-term term short-term memory models for what concern this question are almost identical in their core properties.Sequential Processing: sentences must be processed by words. Past information retained through past hidden states: to the sequence model, follow the Markov property, each state is assumed to be dependent.

The first property is the reason why RNN and LSTM can’t be trained in parallel. Information in RNN and LSTM is retained thanks to previously computed hidden states. Another way in which people mitigated this problem is to work with the Bidirectional Models, which encode the same sentence from two directions, from start to end and end to start.

Long Live Transformers

The attention all you need is a describer transformer and what is called a sequence-to-sequence architecture. Seq2seq is a neural net that transforms a given sequence into another sequence for a specific task. The most famous application of seq2seq models is translation, where the sequence of words from one language is transformed into another language. A popular choice for this type of model is the Long Short-Term Memory-based model.

So Transformers models are born to solve these problems of LSTM. The attention mechanism will replace the recurrent mechanism.

Five steps to understand the mechanism of transformers

This is the core idea of transformers. Self-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

FIRST STEP:

For each word, we create 3 vectors Q, K, V. These vectors are created by multiplying the embedding by three matrices (WQ, WK, WV).

Five steps to understand the mechanism of transformers

SECOND STEP:

We divide the score by the square root of the dimension of the key vector we use. Then use the softmax function to determine how much each word will be expressed at this position.

THIRD STEP:

Multiply each value vector by the softmax score to keep important related words and eliminate the other.

FINAL STEP:

Sum V vectors to have vector attention Z for a word. Then repeat these steps to have a matrix’s attention for a sentence.

Are you interested in learning more about transformers? Enroll in Guvi’s IITM Pravartak certified Artificial Intelligence and Machine Learning Course. This covers all the important concepts of artificial intelligence from basics such as the history of AI, Python programming to advanced level, including transformer architecture, LLMs, with hands-on projects.

What is BERT?

BERT, which stands for Bidirectional Encoder Representations from Transformers, developed by researchers at Google in 2018, is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based on their connection.

Why BERT?

BERT helps the search engine understand the significance of transformer words like ‘to’ and ‘for’ in the keywords used.

For the Question Answering System, BERT takes two parameters, the input question, and passage as a single packed sequence. Then we fine-tune the output to display the answer that exists in the passage.

What are Longformers?

Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, ROBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot attend to longer sequences.

To overcome these long sequence issues, the Longformer essentially combines several attention patterns:

1. SLIDING WINDOW:

The name speaks for itself. In this approach, we take an arbitrary window size w, and each token in the sequence will only attend to some w tokens (mostly w/2 to the left and w/2 to the right).

2 .DILATED SLIDING WINDOW:

We skip 1 word next to get attention. The idea is to create a vastly greater window of attention, the window size w is bigger, so you can incorporate information faster across the layers,it will not harm the model’s computation.

3 . GLOBAL ATTENTION (full self-attention):

Let’s consider the same example of QA tasks. In the case of Longformer, we can have all the question tokens to have a global attention pattern, i.e., to have them attend to all the other tokens in the sequence.

Steps in Building QA Systems Using Transformers

Step 1. Install Anaconda. Install

Step 2. Create an Anaconda environment with Python version 3.7.

conda install -c QAS_longformer python=3.7

Step 3. Activate the environment.

conda activate QAS_longformer

Step 4. We recommend using CUDA for fast training.

#install pytorch with cuda versionpip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

Step 5. Install the Transformer library.

pip install transformers
pip install simpletransformers

CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:

import logging
import json
import queue
import sklearn
import time
import sys
import multiprocessing as mp
from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
#logging.basicConfig(level=logging.INFO)
#transformers_logger = logging.getLogger(“transformers”)
#transformers_logger.setLevel(logging.WARNING)
#if __name__==’__main__’:
#mp.freeze_support()

model_args = QuestionAnsweringArgs(overwrite_output_dir=True,doc_stride=80)
model_args.evaluate_during_training = True

#after training. replace the model outputs it generate when training by path like below.
model = QuestionAnsweringModel(
“longformer”, “./outputs”,use_cuda=True,args=model_args
)

def predictset():
# input_question=sys.argv[1]

# start = timeit.default_timer()

phrase = “break”
input_question = input(“question: “)
if input_question == phrase:
print(“QAS: good bye!”)
else:
to_predict1 = [{
“context”: “<input your context here>”,
“qas”: [
{
“question”: input_question,
“id”: “0”,
}

]
}]
start_time = time.time()
answers, probabilities = model.predict(to_predict1)
print(“— %s seconds —” % (time.time() – start_time))
dict_ans = answers[0]
real_answer = print(dict_ans[“answer”][0])

return print(real_answer),predictset()
predictset()

CONCLUSION

By understanding the basics of transformers and their architecture, you can use transformers for various applications. In this blog, we discussed the steps involved in building question and answer systems using transformers, which is one of the top applications of transformers.

Success Stories

About the Author

Roopa Dharshini

I am an open Source contributor & project admin, full stack developer, and technical content writer. A core member of the Guvi's SHECODES community, with expertise in Data Structures, Generative AI, MERN Stack, and Cloud Platforms. Fun Fact: Hates Bootstrap!

View all posts by Roopa Dharshini

Did you enjoy this article?

Recommended Courses

Artificial Intelligence and Machine Learning Course

Available in

English

Blog Categories

Interview Questions

Artificial Intelligence and Machine Learning Articles

Step-by-Step Guide to Building QA Systems Using Transformers

Table of contents

RNN and LSTM

Long Live Transformers

Five steps to understand the mechanism of transformers

FIRST STEP:

SECOND STEP:

THIRD STEP:

FINAL STEP:

What is BERT?

Why BERT?

What are Longformers?

1. SLIDING WINDOW:

2 .DILATED SLIDING WINDOW:

3 . GLOBAL ATTENTION (full self-attention):

Steps in Building QA Systems Using Transformers

CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:

CONCLUSION

Success Stories

About the Author

Roopa Dharshini

Did you enjoy this article?

Recommended Courses

Most Popular

Artificial Intelligence and Machine Learning Course

Know More

Chatgpt for Everyone

Natural Language Processing Us...

Dalle in French

Machine Learning and AI Servic...

ChatGPT for Programmers

Keras for Beginners

Keras for Beginners in Hindi

Keras for Beginners in Telugu

Deep learning using Pytorch

Deep learning using Pytorch

Practical Machine Learning

Virtual AI Assistant Powered b...

Schedule 1:1 free counselling

Similar Articles

Artificial Intelligence and Machine Learning Articles