header_logo
Post thumbnail
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Step-by-Step Guide to Building QA Systems Using Transformers

By Roopa Dharshini

In this blog, we are going to understand a step-by-step guide on building QA systems using transformers.

The question and answer system using transformers is commonly used in the field of natural language processing, and it has a wide range of applications, and it gives answers to the questions in the form of human human-understandable.

The question and answer system using transformers can be implemented in several methods and mechanisms. We will discuss each mechanism in detail. Let’s get started!

Table of contents


  1. RNN and LSTM
  2. Long Live Transformers
    • Five steps to understand the mechanism of transformers
    • What is BERT?
    • Why BERT?
    • What are Longformers?
  3. Steps in Building QA Systems Using Transformers
    • CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:
  4. CONCLUSION

RNN and LSTM

Recurrent Neural Network works with a generalization of a feedforward Neural network that has an internal memory. RNN uses its internal state (memory) to process a sequence of inputs.

RNN and LSTM

Long-term short-term memory (LSTM) networks are modified versions of recurrent neural networks, making it easier to remember past data in memory. The vanishing gradient problem is resolved here. LSTM is well-suited to classify, process, and predict time series with time lags of unknown duration. It trains the model by using back propagation.

RNN and LSTM

Recurrent neural networks and long-term term short-term memory models for what concern this question are almost identical in their core properties.Sequential Processing: sentences must be processed by words. Past information retained through past hidden states: to the sequence model, follow the Markov property, each state is assumed to be dependent.

The first property is the reason why RNN and LSTM can’t be trained in parallel. Information in RNN and LSTM is retained thanks to previously computed hidden states. Another way in which people mitigated this problem is to work with the Bidirectional Models, which encode the same sentence from two directions, from start to end and end to start.

Long Live Transformers

Long Live Transformers

The attention all you need is a describer transformer and what is called a sequence-to-sequence architecture. Seq2seq is a neural net that transforms a given sequence into another sequence for a specific task. The most famous application of seq2seq models is translation, where the sequence of words from one language is transformed into another language. A popular choice for this type of model is the Long Short-Term Memory-based model.

So Transformers models are born to solve these problems of LSTM. The attention mechanism will replace the recurrent mechanism.

Five steps to understand the mechanism of transformers

This is the core idea of transformers. Self-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

FIRST STEP:

For each word, we create 3 vectors Q, K, V. These vectors are created by multiplying the embedding by three matrices (WQ, WK, WV).

Five steps to understand the mechanism of transformers

SECOND STEP: 

We divide the score by the square root of the dimension of the key vector we use. Then use the softmax function to determine how much each word will be expressed at this position.

Five steps to understand the mechanism of transformers

THIRD STEP: 

Multiply each value vector by the softmax score to keep important related words and eliminate the other.

Five steps to understand the mechanism of transformers

FINAL STEP: 

Sum V vectors to have vector attention Z for a word. Then repeat these steps to have a matrix’s attention for a sentence.

Five steps to understand the mechanism of transformers
Five steps to understand the mechanism of transformers

Are you interested in learning more about transformers? Enroll in Guvi’s IITM Pravartak certified Artificial Intelligence and Machine Learning Course. This covers all the important concepts of artificial intelligence from basics such as the history of AI, Python programming to advanced level, including transformer architecture, LLMs, with hands-on projects.  

MDN

What is BERT?

BERT, which stands for Bidirectional Encoder Representations from Transformers, developed by researchers at Google in 2018, is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based on their connection.

What is BERT?

Why BERT?

BERT helps the search engine understand the significance of transformer words like ‘to’ and ‘for’ in the keywords used.

Why BERT?

For the Question Answering System, BERT takes two parameters, the input question, and passage as a single packed sequence. Then we fine-tune the output to display the answer that exists in the passage.

What are Longformers?

Transformer-based language models have been leading the NLP benchmarks lately. Models like BERT, ROBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot attend to longer sequences.

To overcome these long sequence issues, the Longformer essentially combines several attention patterns:

1. SLIDING WINDOW:

The name speaks for itself. In this approach, we take an arbitrary window size w, and each token in the sequence will only attend to some w tokens (mostly w/2 to the left and w/2 to the right).

2 .DILATED SLIDING WINDOW:

We skip 1 word next to get attention. The idea is to create a vastly greater window of attention, the window size w is bigger, so you can incorporate information faster across the layers,it will not harm the model’s computation.

3 . GLOBAL  ATTENTION (full self-attention):

Let’s consider the same example of QA tasks. In the case of Longformer, we can have all the question tokens to have a global attention pattern, i.e., to have them attend to all the other tokens in the sequence.

Steps in Building QA Systems Using Transformers 

Step 1. Install Anaconda. Install

Step 2. Create an Anaconda environment with Python version 3.7.

conda install -c QAS_longformer python=3.7

Step 3. Activate the environment.

conda activate QAS_longformer

Step 4. We recommend using CUDA for fast training.

#install pytorch with cuda versionpip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

Step 5. Install the Transformer library.

pip install transformers
pip install simpletransformers

CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:

import logging
import json
import queue
import sklearn
import time
import sys
import multiprocessing as mp
from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
#logging.basicConfig(level=logging.INFO)
#transformers_logger = logging.getLogger(“transformers”)
#transformers_logger.setLevel(logging.WARNING)
#if __name__==’__main__’:
#mp.freeze_support()


model_args = QuestionAnsweringArgs(overwrite_output_dir=True,doc_stride=80)
model_args.evaluate_during_training = True

#after training. replace the model outputs it generate when training by path like below.
model = QuestionAnsweringModel(
“longformer”, “./outputs”,use_cuda=True,args=model_args
)

def predictset():
# input_question=sys.argv[1]

# start = timeit.default_timer()

phrase = “break”
input_question = input(“question: “)
if input_question == phrase:
print(“QAS: good bye!”)
else:
to_predict1 = [{
“context”: “<input your context here>”,
“qas”: [
{
“question”: input_question,
“id”: “0”,
}

]
}]
start_time = time.time()
answers, probabilities = model.predict(to_predict1)
print(“— %s seconds —” % (time.time() – start_time))
dict_ans = answers[0]
real_answer = print(dict_ans[“answer”][0])

return print(real_answer),predictset()
predictset()
MDN

CONCLUSION

By understanding the basics of transformers and their architecture, you can use transformers for various applications. In this blog, we discussed the steps involved in building question and answer systems using transformers, which is one of the top applications of transformers. 

Success Stories

Did you enjoy this article?

Schedule 1:1 free counselling

Similar Articles

Loading...
Share logo Copy link
Power Packed Webinars
Free Webinar Icon
Power Packed Webinars
Subscribe now for FREE! 🔔
close
Webinar ad
Table of contents Table of contents
Table of contents Articles
Close button

  1. RNN and LSTM
  2. Long Live Transformers
    • Five steps to understand the mechanism of transformers
    • What is BERT?
    • Why BERT?
    • What are Longformers?
  3. Steps in Building QA Systems Using Transformers
    • CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:
  4. CONCLUSION