{"id":82499,"date":"2025-07-01T15:36:18","date_gmt":"2025-07-01T10:06:18","guid":{"rendered":"https:\/\/www.guvi.in\/blog\/?p=82499"},"modified":"2025-09-10T11:21:58","modified_gmt":"2025-09-10T05:51:58","slug":"guide-to-building-qa-systems-using-transformers","status":"publish","type":"post","link":"https:\/\/www.guvi.in\/blog\/guide-to-building-qa-systems-using-transformers\/","title":{"rendered":"Step-by-Step Guide to Building QA Systems Using Transformers"},"content":{"rendered":"\n<p>In this blog, we are going to understand a step-by-step guide on building QA systems using transformers.<\/p>\n\n\n\n<p>The question and answer system using transformers is commonly used in the field of natural language processing, and it has a wide range of applications, and it gives answers to the questions in the form of human human-understandable.<\/p>\n\n\n\n<p>The question and answer system using transformers can be implemented in several methods and mechanisms. We will discuss each mechanism in detail. Let\u2019s get started!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">RNN and LSTM<\/h2>\n\n\n\n<p>Recurrent Neural Network works with a generalization of a feedforward Neural network that has an internal memory. RNN uses its internal state (memory) to process a sequence of inputs.<\/p>\n\n\n\n<p>Long-term short-term memory (LSTM) networks are modified versions of recurrent neural networks, making it easier to remember past data in memory. The vanishing gradient problem is resolved here. LSTM is well-suited to classify, process, and predict time series with time lags of unknown duration. It trains the model by using back propagation.<\/p>\n\n\n\n<p>Recurrent neural networks and long-term short-term memory models, as this question, are almost identical in their core properties.Sequential Processing: sentences must be processed by words. Past information retained through past hidden states: to the sequence model, follow the Markov property, each state is assumed to be dependent.<\/p>\n\n\n\n<p>The first property is the reason why RNN and LSTM can&#8217;t be trained in parallel. Information in RNN and LSTM is retained thanks to previously computed hidden states. Another way in which people mitigated this problem is to work with the Bidirectional Models, which encode the same sentence from two directions, from start to end and end to start.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Long Live Transformers<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-1200x636.png\" alt=\"transformer architecture\" class=\"wp-image-83720\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-1200x636.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-300x159.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-768x407.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-1536x814.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-2048x1085.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-Do-We-Need-This-Smart-Food-Application_-2-150x80.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>All you need is a describer transformer and what is called a sequence-to-sequence architecture. Seq2seq is a <a href=\"https:\/\/www.guvi.in\/blog\/neural-networks-in-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">neural network<\/a> that transforms a given sequence into another sequence for a specific task. The most famous application of seq2seq models is translation, where the sequence of words from one language is transformed into another language. A popular choice for this type of model is the Long Short-Term Memory-based model.<\/p>\n\n\n\n<p>So Transformers models are born to solve these problems of LSTM. The attention mechanism will replace the recurrent mechanism.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Five steps to understand the mechanism of transformers<\/h3>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-1200x630.png\" alt=\"Steps to understand the mechanism of transformers\" class=\"wp-image-83717\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-understand-the-mechanism-of-transformers-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>This is the core idea of transformers. Self-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">FIRST STEP:<\/h4>\n\n\n\n<p>For each word, we create 3 vectors Q, K,&nbsp;V.&nbsp;These vectors are created by multiplying the embedding by three matrices (WQ, WK, WV).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">SECOND STEP:&nbsp;<\/h4>\n\n\n\n<p>We divide the score by the square root of the dimension of the key vector we use. Then use the softmax function to <strong>determine how much each word will be expressed at this position<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">THIRD STEP:&nbsp;<\/h4>\n\n\n\n<p><strong>Multiply each value vector by the softmax score<\/strong> to keep important related words and eliminate the other.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">FINAL STEP:&nbsp;<\/h4>\n\n\n\n<p><strong>Sum V vectors to have vector attention Z for a word<\/strong>. Then repeat these steps to have a matrix&#8217;s attention for a sentence.<\/p>\n\n\n\n<p>Are you interested in learning more about transformers? Enroll in HCL Guvi\u2019s IITM Pravartak certified <a href=\"https:\/\/www.guvi.in\/zen-class\/artificial-intelligence-and-machine-learning-course\/?utm_source=Blog&amp;utm_medium=Hyperlink&amp;utm_campaign=Guide+to+Building+QA+Systems+Using+Transformers\" target=\"_blank\" rel=\"noreferrer noopener\">Artificial Intelligence and Machine Learning Course<\/a>. This covers all the important concepts of artificial intelligence from basics such as the <a href=\"https:\/\/www.guvi.in\/blog\/the-rise-raise-of-artificial-intelligence\/\">history of AI<\/a>, <a href=\"https:\/\/www.guvi.in\/blog\/benefits-of-learning-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python programming<\/a> to advanced level, including transformer architecture, <a href=\"https:\/\/www.guvi.in\/blog\/artificial-intelligence-llms-and-prompting\/\" target=\"_blank\" rel=\"noreferrer noopener\">LLMs<\/a>, with hands-on projects.\u00a0\u00a0<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is BERT?<\/h3>\n\n\n\n<p>BERT, which stands for Bidirectional Encoder Representations from Transformers, developed by researchers at Google in 2018, is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based on their connection.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-1200x636.png\" alt=\"What is BERT\" class=\"wp-image-83718\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-1200x636.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-300x159.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-768x407.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-1536x814.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-2048x1085.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/What-is-BERT_-150x80.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Why BERT?<\/h3>\n\n\n\n<p>BERT helps the search engine understand the significance of transformer words like \u2018to\u2019 and \u2018for\u2019 in the keywords used.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-1200x636.png\" alt=\"Why BERT\" class=\"wp-image-83719\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-1200x636.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-300x159.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-768x407.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-1536x814.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-2048x1085.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Why-BERT_-150x80.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>For the Question Answering System, BERT takes two parameters, the input question and passage as a single packed sequence. Then we fine-tune the output to display the answer that exists in the passage.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What are Longformers?<\/h3>\n\n\n\n<p>Transformer-based language models have been leading the <a href=\"https:\/\/www.guvi.in\/blog\/must-know-nlp-hacks-for-beginners\/\" target=\"_blank\" rel=\"noreferrer noopener\">NLP<\/a> benchmarks lately. Models like BERT, ROBERTa have been state-of-the-art for a while. However, one major drawback of these models is that they cannot attend to longer sequences.<\/p>\n\n\n\n<p>To overcome these long sequence issues, the Longformer essentially combines several attention patterns:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1. SLIDING WINDOW:<\/h4>\n\n\n\n<p>The name speaks for itself. In this approach, we take an arbitrary window size w, and each token in the sequence will only attend to some w tokens (mostly w\/2 to the left and w\/2 to the right).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2 .DILATED SLIDING WINDOW:<\/h4>\n\n\n\n<p>We skip 1 word next to get attention. The idea is to create a vastly greater window of attention, the window size w is bigger, so you can incorporate information faster across the layers,it will not harm the model&#8217;s computation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">3 . GLOBAL&nbsp; ATTENTION (full self-attention):<\/h4>\n\n\n\n<p>Let&#8217;s consider the same example of QA tasks. In the case of Longformer, we can have all the question tokens to have a global attention pattern, i.e., to have them attend to all the other tokens in the sequence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Steps in Building QA Systems Using Transformers&nbsp;<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-1200x630.png\" alt=\"Steps to Build Question and Answer Systems Using Transformers\" class=\"wp-image-83716\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Steps-to-Build-Question-and-Answer-Systems-Using-Transformers-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Step 1. Install Anaconda. <a href=\"https:\/\/www.anaconda.com\/products\/individual\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Install<\/a><\/p>\n\n\n\n<p>Step 2. Create an Anaconda environment with Python version 3.7.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>conda install -c QAS_longformer python=3.7<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Step 3. Activate the environment.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>conda activate QAS_longformer<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Step 4. We recommend using CUDA for fast training.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>#install pytorch with cuda versionpip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https:\/\/download.pytorch.org\/whl\/torch_stable.html<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Step 5. Install the Transformer library.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>pip install transformers<br>pip install simpletransformers<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">CODE FOR BUILDING AND ANSWERS SYSTEM USING TRANSFORMERS:<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>import logging<br>import json<br>import queue<br>import sklearn<br>import time<br>import sys<br>import multiprocessing as mp<br>from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs<br>from sklearn.metrics import accuracy_score, precision_recall_fscore_support<br>#logging.basicConfig(level=logging.INFO)<br>#transformers_logger = logging.getLogger(&#8220;transformers&#8221;)<br>#transformers_logger.setLevel(logging.WARNING)<br>#if __name__==&#8217;__main__&#8217;:<br>#mp.freeze_support()<br><br><br>model_args = QuestionAnsweringArgs(overwrite_output_dir=True,doc_stride=80)<br>model_args.evaluate_during_training = True<br><br>#after training. replace the model outputs it generate when training by path like below.<br>model = QuestionAnsweringModel(<br>&#8220;longformer&#8221;, &#8220;.\/outputs&#8221;,use_cuda=True,args=model_args<br>)<br><br>def predictset():<br># input_question=sys.argv[1]<br><br># start = timeit.default_timer()<br><br>phrase = &#8220;break&#8221;<br>input_question = input(&#8220;question: &#8220;)<br>if input_question == phrase:<br>print(&#8220;QAS: good bye!&#8221;)<br>else:<br>to_predict1 = [{<br>&#8220;context&#8221;: &#8220;&lt;input your context here&gt;&#8221;,<br>&#8220;qas&#8221;: [<br>{<br>&#8220;question&#8221;: input_question,<br>&#8220;id&#8221;: &#8220;0&#8221;,<br>}<br><br>]<br>}]<br>start_time = time.time()<br>answers, probabilities = model.predict(to_predict1)<br>print(&#8220;&#8212; %s seconds &#8212;&#8221; % (time.time() &#8211; start_time))<br>dict_ans = answers[0]<br>real_answer = print(dict_ans[&#8220;answer&#8221;][0])<br><br>return print(real_answer),predictset()<br>predictset()<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">CONCLUSION<\/h2>\n\n\n\n<p>By understanding the basics of transformers and their architecture, you can use transformers for various applications. In this blog, we discussed the steps involved in building question and answer systems using transformers, which is one of the top applications of transformers.&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog, we are going to understand a step-by-step guide on building QA systems using transformers. The question and answer system using transformers is commonly used in the field of natural language processing, and it has a wide range of applications, and it gives answers to the questions in the form of human human-understandable. [&hellip;]<\/p>\n","protected":false},"author":57,"featured_media":83715,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[933],"tags":[],"views":"1423","authorinfo":{"name":"Roopa Dharshini","url":"https:\/\/www.guvi.in\/blog\/author\/roopa-dharshini\/"},"thumbnailURL":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Building-QA-Systems-Using-Transformers-300x116.png","jetpack_featured_media_url":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Building-QA-Systems-Using-Transformers.png","_links":{"self":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/82499"}],"collection":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/users\/57"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/comments?post=82499"}],"version-history":[{"count":8,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/82499\/revisions"}],"predecessor-version":[{"id":86858,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/82499\/revisions\/86858"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media\/83715"}],"wp:attachment":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media?parent=82499"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/categories?post=82499"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/tags?post=82499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}