{"id":106414,"date":"2026-04-10T18:15:23","date_gmt":"2026-04-10T12:45:23","guid":{"rendered":"https:\/\/www.guvi.in\/blog\/?p=106414"},"modified":"2026-06-26T15:23:07","modified_gmt":"2026-06-26T09:53:07","slug":"rag-chatbot-with-huggingface-and-streamlit","status":"publish","type":"post","link":"https:\/\/www.guvi.in\/blog\/rag-chatbot-with-huggingface-and-streamlit\/","title":{"rendered":"RAG Chatbot With HuggingFace And Streamlit: Complete Tutorial"},"content":{"rendered":"\n<p>If you&#8217;ve ever wished a chatbot could actually <em>know<\/em> your documents, not just give generic answers, that&#8217;s exactly what RAG makes possible.<\/p>\n\n\n\n<p>Most language models are powerful, but they&#8217;re limited to what they learned during training. They don&#8217;t know about your company&#8217;s internal policies, your research papers, or your product documentation. RAG fixes this by letting the model <em>retrieve<\/em> relevant information before it responds.<\/p>\n\n\n\n<p>In this tutorial, you&#8217;ll build a RAG chatbot from scratch using HuggingFace for the AI backbone and Streamlit for a clean, interactive UI. This is a hands-on guide, so expect code, explanations, and real decisions you&#8217;ll make as a developer. Without further ado, let us get started!<\/p>\n\n\n\n<p><strong>TL;DR Summary<\/strong><\/p>\n\n\n\n<p>1. This tutorial introduces RAG (Retrieval-Augmented Generation) chatbots and explains how they differ from standard chatbots by retrieving information directly from your own documents before generating a response.<\/p>\n\n\n\n<p>2. It walks through the complete environment setup, including installing HuggingFace, LangChain, FAISS, and Streamlit, and explains the role each tool plays in the pipeline.<\/p>\n\n\n\n<p>3. The guide covers the full RAG pipeline, loading and chunking documents, generating embeddings with HuggingFace&#8217;s all-MiniLM-L6-v2 model, and storing vectors using FAISS for fast similarity search.<\/p>\n\n\n\n<p>4. It includes a hands-on implementation of the retrieval chain using flan-t5-base as the language model, covering key parameters like chunk size, retrieval depth, and token limits.<\/p>\n\n\n\n<p>5. The tutorial walks through building a clean, interactive chat interface using Streamlit, complete with file upload, conversation history, and a source document viewer.<\/p>\n\n\n\n<p>6. It also covers common errors you may encounter during setup, practical fixes for each, and next steps to extend the chatbot with memory, multi-file support, and cloud deployment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is a RAG Chatbot?<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-1200x636.webp\" alt=\"What is a RAG Chatbot?\" class=\"wp-image-112692\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-1200x636.webp 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-300x159.webp 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-768x407.webp 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-1536x814.webp 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-2048x1085.webp 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/1-5-150x80.webp 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>RAG stands for <a href=\"https:\/\/www.guvi.in\/blog\/guide-for-retrieval-augmented-generation\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Retrieval-Augmented Generation<\/strong><\/a>. It&#8217;s an AI architecture that combines two things:<\/p>\n\n\n\n<ul>\n<li>A <strong>retrieval system<\/strong> that fetches relevant chunks of text from your documents<\/li>\n\n\n\n<li>A <strong>generation model<\/strong> that reads those chunks and produces a coherent answer<\/li>\n<\/ul>\n\n\n\n<p>Think of it like an open-book exam. Instead of the model relying purely on memory, it gets to look things up before answering. This makes responses far more accurate and grounded in your actual data.<\/p>\n\n\n\n<p>RAG chatbots are widely used for customer support systems, internal knowledge bases, document Q&amp;A tools, and research assistants.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How Does RAG Work?<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-1200x636.webp\" alt=\"How Does RAG Work?\" class=\"wp-image-112693\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-1200x636.webp 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-300x159.webp 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-768x407.webp 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-1536x814.webp 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-2048x1085.webp 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/2-5-150x80.webp 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Before writing a single line of code, it helps to understand what happens under the hood.<\/p>\n\n\n\n<p>Here&#8217;s the flow, step by step:<\/p>\n\n\n\n<ol>\n<li><strong>Document ingestion<\/strong>: You load your documents (PDF, text files, etc.)<\/li>\n\n\n\n<li><strong>Chunking<\/strong>: The documents are split into smaller, manageable pieces<\/li>\n\n\n\n<li><strong>Embedding<\/strong>: Each chunk is converted into a vector (a numerical representation)<\/li>\n\n\n\n<li><strong>Vector storage<\/strong>: These vectors are stored in a vector database<\/li>\n\n\n\n<li><strong>Query processing<\/strong>: When a user asks a question, it&#8217;s also converted into a vector<\/li>\n\n\n\n<li><strong>Retrieval<\/strong>: The system finds the most similar chunks to the query<\/li>\n\n\n\n<li><strong>Generation:<\/strong> The language model uses those chunks to generate a response<\/li>\n<\/ol>\n\n\n\n<p>Each step builds on the last, and you&#8217;ll implement all of them in this tutorial.<\/p>\n\n\n\n<p><em>Learn More: <\/em><a href=\"https:\/\/www.guvi.in\/blog\/how-to-build-rag-pipelines-in-ai-applications\/\" target=\"_blank\" rel=\"noreferrer noopener\"><em>How to Build RAG Pipelines in AI Applications<\/em><\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Tools You&#8217;ll Need<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"636\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-1200x636.webp\" alt=\"Tools You&#039;ll Need\" class=\"wp-image-112694\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-1200x636.webp 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-300x159.webp 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-768x407.webp 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-1536x814.webp 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-2048x1085.webp 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/05\/3-5-150x80.webp 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Here&#8217;s a quick overview of everything this project relies on:<\/p>\n\n\n\n<ul>\n<li><a href=\"https:\/\/www.guvi.in\/hub\/python\/\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/www.guvi.in\/hub\/python\/\" rel=\"noreferrer noopener\"><strong>Python<\/strong><\/a><strong> 3.9+<\/strong>: The programming language for the entire project<\/li>\n\n\n\n<li><a href=\"https:\/\/huggingface.co\/blog\/noob_intro_transformers\" target=\"_blank\" rel=\"noreferrer noopener nofollow\"><strong>HuggingFace Transformers<\/strong><\/a>: For loading embedding models and LLMs<\/li>\n\n\n\n<li><strong>LangChain<\/strong>: To manage the retrieval chain and document processing<\/li>\n\n\n\n<li><strong>FAISS<\/strong>: A fast vector store for similarity search<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/www.guvi.in\/blog\/what-is-streamlit\/\" target=\"_blank\" rel=\"noreferrer noopener\">Streamlit<\/a><\/strong>: To build the chat interface<\/li>\n\n\n\n<li><strong>PyPDF2 or pdfplumber<\/strong>: For reading PDF documents<\/li>\n<\/ul>\n\n\n\n<p>You don&#8217;t need a GPU to follow along. HuggingFace offers lightweight models that run on CPU, though responses may be slightly slower.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Setting Up Your Environment<\/strong><\/h2>\n\n\n\n<p>Start by creating a virtual environment to keep dependencies clean.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python -m venv rag-env\n\nsource rag-env\/bin\/activate&nbsp; # On Windows: rag-env\\Scripts\\activate<\/code><\/pre>\n\n\n\n<p>Now install all the required packages:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pip install streamlit langchain langchain-community \\\n\n&nbsp;&nbsp;transformers sentence-transformers faiss-cpu \\\n\n&nbsp;&nbsp;pypdf2 huggingface_hub<\/code><\/pre>\n\n\n\n<p>Once installed, create a project folder structure like this:<\/p>\n\n\n\n<p>rag-chatbot\/<\/p>\n\n\n\n<p>\u2502<\/p>\n\n\n\n<p>\u251c\u2500\u2500 app.py<\/p>\n\n\n\n<p>\u251c\u2500\u2500 rag_pipeline.py<\/p>\n\n\n\n<p>\u251c\u2500\u2500 requirements.txt<\/p>\n\n\n\n<p>\u2514\u2500\u2500 docs\/&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \u2190 your documents go here<\/p>\n\n\n\n<p>Keeping your pipeline logic separate from the Streamlit app makes the project easier to debug and scale later.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Loading and Splitting Documents<\/strong><\/h2>\n\n\n\n<p>The first real step is getting your documents into the system. LangChain makes this straightforward with its document loaders.<\/p>\n\n\n\n<p>In rag_pipeline.py, start with this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain.document_loaders import PyPDFLoader\n\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\ndef load_documents(file_path):\n\n&nbsp;&nbsp;&nbsp;&nbsp;loader = PyPDFLoader(file_path)\n\n&nbsp;&nbsp;&nbsp;&nbsp;documents = loader.load()\n\n&nbsp;&nbsp;&nbsp;&nbsp;return documents\n\ndef split_documents(documents):\n\n&nbsp;&nbsp;&nbsp;&nbsp;splitter = RecursiveCharacterTextSplitter(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;chunk_size=500,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;chunk_overlap=50\n\n&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;chunks = splitter.split_documents(documents)\n\n&nbsp;&nbsp;&nbsp;&nbsp;return chunks<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why Chunk Size Matters<\/strong><\/h3>\n\n\n\n<p>Chunk size is one of those decisions that directly affects your chatbot&#8217;s quality.<\/p>\n\n\n\n<ul>\n<li><strong>Too large<\/strong>: The model receives too much context and struggles to focus<\/li>\n\n\n\n<li><strong>Too small<\/strong>: Important information gets cut off mid-sentence<\/li>\n\n\n\n<li><strong>500 tokens with 50 overlap<\/strong>: A solid starting point for most use cases<\/li>\n<\/ul>\n\n\n\n<p>The overlap ensures that information near chunk boundaries isn&#8217;t lost during retrieval.<\/p>\n\n\n\n<div style=\"background-color: #099f4e; border: 3px solid #110053; border-radius: 12px; padding: 18px 22px; color: #FFFFFF; font-size: 18px; font-family: Montserrat, Helvetica, sans-serif; line-height: 1.6; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15); max-width: 750px;\"><strong style=\"font-size: 22px; color: #FFFFFF;\">\ud83d\udca1 Did You Know?<\/strong> <br \/><br \/>The concept behind RAG was introduced in a 2020 paper by Meta AI researchers. It was designed to reduce &#8220;hallucinations&#8221; \u2014 a common problem where language models confidently generate incorrect information. By grounding responses in retrieved documents, RAG significantly improves factual accuracy.<\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Creating Embeddings With HuggingFace<\/strong><\/h2>\n\n\n\n<p>Embeddings are what make semantic search possible. They convert text into vectors so that similar meanings map to nearby points in a high-dimensional space.<\/p>\n\n\n\n<p>HuggingFace&#8217;s sentence-transformers library gives you access to excellent, free embedding models.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain.embeddings import HuggingFaceEmbeddings\n\ndef get_embeddings():\n\n&nbsp;&nbsp;&nbsp;&nbsp;embeddings = HuggingFaceEmbeddings(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;model_name=\"sentence-transformers\/all-MiniLM-L6-v2\"\n\n&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;return embeddings<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why <\/strong><strong>all-MiniLM-L6-v2<\/strong><strong>?<\/strong><\/h3>\n\n\n\n<p>This model is a great default choice for several reasons:<\/p>\n\n\n\n<ul>\n<li>It&#8217;s <strong>lightweight<\/strong>, runs efficiently on CPU<\/li>\n\n\n\n<li>It produces <strong>high-quality embeddings<\/strong> for English text<\/li>\n\n\n\n<li>It&#8217;s widely tested in production RAG systems<\/li>\n\n\n\n<li>It&#8217;s <strong>completely free<\/strong> to use<\/li>\n<\/ul>\n\n\n\n<p>If you need multilingual support, consider paraphrase-multilingual-MiniLM-L12-v2 instead.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Building the Vector Store<\/strong><\/h2>\n\n\n\n<p>Once you have your chunks and embeddings ready, you store them in <a href=\"https:\/\/faiss.ai\/index.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">FAISS<\/a>, a vector database developed by Meta that&#8217;s optimised for fast similarity search.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain.vectorstores import FAISS\n\ndef create_vector_store(chunks, embeddings):\n\n&nbsp;&nbsp;&nbsp;&nbsp;vector_store = FAISS.from_documents(chunks, embeddings)\n\n&nbsp;&nbsp;&nbsp;&nbsp;return vector_store\n\ndef save_vector_store(vector_store, path=\"faiss_index\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;vector_store.save_local(path)\n\ndef load_vector_store(path, embeddings):\n\n&nbsp;&nbsp;&nbsp;&nbsp;return FAISS.load_local(path, embeddings,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;allow_dangerous_deserialization=True)<\/code><\/pre>\n\n\n\n<p>Saving the vector store locally means you don&#8217;t re-process documents every time the app restarts. For larger document sets, this saves significant time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Setting Up the Retrieval Chain<\/strong><\/h2>\n\n\n\n<p>This is where everything connects. The retrieval chain takes a user query, finds the most relevant chunks, and passes them to the language model along with the question.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from langchain.llms import HuggingFacePipeline\n\nfrom langchain.chains import RetrievalQA\n\nfrom transformers import pipeline\n\ndef get_llm():\n\n&nbsp;&nbsp;&nbsp;&nbsp;pipe = pipeline(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\"text2text-generation\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;model=\"google\/flan-t5-base\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;max_new_tokens=512\n\n&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;llm = HuggingFacePipeline(pipeline=pipe)\n\n&nbsp;&nbsp;&nbsp;&nbsp;return llm\n\ndef build_qa_chain(vector_store):\n\n&nbsp;&nbsp;&nbsp;&nbsp;llm = get_llm()\n\n&nbsp;&nbsp;&nbsp;&nbsp;retriever = vector_store.as_retriever(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;search_kwargs={\"k\": 3}\n\n&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;qa_chain = RetrievalQA.from_chain_type(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;llm=llm,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;chain_type=\"stuff\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;retriever=retriever,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_source_documents=True\n\n&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;return qa_chain<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Understanding the Key Parameters<\/strong><\/h3>\n\n\n\n<ul>\n<li><strong>k=3<\/strong>: Retrieves the top 3 most relevant chunks per query<\/li>\n\n\n\n<li><strong>chain_type=&#8221;stuff:<\/strong> Passes all retrieved chunks directly into the prompt (best for smaller chunk sets)<\/li>\n\n\n\n<li><strong>return_source_documents=True<\/strong>: Lets you show users <em>where<\/em> the answer came from<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Choosing the Right Model<\/strong><\/h3>\n\n\n\n<p>flan-t5-base is a solid, CPU-friendly model for question answering. Here are a few alternatives depending on your needs:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Model<\/strong><\/td><td><strong>Size<\/strong><\/td><td><strong>Best For<\/strong><\/td><\/tr><tr><td>flan-t5-base<\/td><td>~250MB<\/td><td>Quick prototyping, CPU use<\/td><\/tr><tr><td>flan-t5-large<\/td><td>~770MB<\/td><td>Better accuracy<\/td><\/tr><tr><td>mistralai\/Mistral-7B<\/td><td>~7GB<\/td><td>Production quality (needs GPU)<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><strong>Choosing the Right Model<\/strong><\/figcaption><\/figure>\n\n\n\n<p>For local development, stick with flan-t5-base. You can always swap the model later.<\/p>\n\n\n\n<p>If you are interested in learning more about RAG and how Generative AI impacts the current technological landscape, consider reading HCL GUVI\u2019s <a href=\"https:\/\/www.guvi.in\/mlp\/genai-ebook?utm_source=blog&amp;utm_medium=hyperlink+&amp;utm_campaign=rag-chatbot-with-huggingface-and-streamlit\" target=\"_blank\" rel=\"noreferrer noopener\">Free Generative AI Ebook<\/a>, where you learn the basic mechanism of GenAI and its real-world applications in the fields of gaming, coding, entertainment, and many more.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Building the Streamlit Interface<\/strong><\/h2>\n\n\n\n<p>Now for the part users actually see. Streamlit lets you build interactive web apps with pure Python, no frontend experience needed.<\/p>\n\n\n\n<p>Create your app.py file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import streamlit as st\n\nfrom rag_pipeline import (\n\n&nbsp;&nbsp;&nbsp;&nbsp;load_documents, split_documents, get_embeddings,\n\n&nbsp;&nbsp;&nbsp;&nbsp;create_vector_store, build_qa_chain\n\n)\n\nst.set_page_config(page_title=\"RAG Chatbot\", layout=\"wide\")\n\nst.title(\"\ud83d\udcc4 RAG Chatbot \u2014 Ask Your Documents\")\n\n# File upload\n\nuploaded_file = st.file_uploader(\"Upload a PDF\", type=&#91;\"pdf\"])\n\nif uploaded_file:\n\n&nbsp;&nbsp;&nbsp;&nbsp;with open(\"temp_doc.pdf\", \"wb\") as f:\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;f.write(uploaded_file.read())\n\n&nbsp;&nbsp;&nbsp;&nbsp;with st.spinner(\"Processing your document...\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;docs = load_documents(\"temp_doc.pdf\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;chunks = split_documents(docs)\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;embeddings = get_embeddings()\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;vector_store = create_vector_store(chunks, embeddings)\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;qa_chain = build_qa_chain(vector_store)\n\n&nbsp;&nbsp;&nbsp;&nbsp;st.success(\"Document processed! Ask your question below.\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;# Chat interface\n\n&nbsp;&nbsp;&nbsp;&nbsp;if \"messages\" not in st.session_state:\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.session_state.messages = &#91;]\n\n&nbsp;&nbsp;&nbsp;&nbsp;for msg in st.session_state.messages:\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with st.chat_message(msg&#91;\"role\"]):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.write(msg&#91;\"content\"])\n\n&nbsp;&nbsp;&nbsp;&nbsp;user_input = st.chat_input(\"Ask something about your document...\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;if user_input:\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.session_state.messages.append(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{\"role\": \"user\", \"content\": user_input}\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with st.chat_message(\"user\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.write(user_input)\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with st.chat_message(\"assistant\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with st.spinner(\"Thinking...\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;result = qa_chain({\"query\": user_input})\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;answer = result&#91;\"result\"]\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sources = result&#91;\"source_documents\"]\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.write(answer)\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with st.expander(\"View Sources\"):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for i, doc in enumerate(sources):\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.write(f\"**Source {i+1}:**\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.write(doc.page_content&#91;:300] + \"...\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;st.session_state.messages.append(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{\"role\": \"assistant\", \"content\": answer}\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What&#8217;s Happening in This Interface?<\/strong><\/h3>\n\n\n\n<p>The app is doing several things at once:<\/p>\n\n\n\n<ul>\n<li><strong>File upload<\/strong>: Users can upload any PDF directly in the browser<\/li>\n\n\n\n<li><strong>Session state<\/strong>: Stores conversation history so the chat feels continuous<\/li>\n\n\n\n<li><strong>Source display<\/strong>: Shows which document chunks were used to generate the answer<\/li>\n\n\n\n<li><strong>Spinner<\/strong>: Gives feedback while the model is working<\/li>\n<\/ul>\n\n\n\n<p>This is a clean, functional interface that covers the core experience without overcomplicating things.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Running Your RAG Chatbot<\/strong><\/h2>\n\n\n\n<p>You&#8217;re almost there. Run the app with a single command:<\/p>\n\n\n\n<p><code>streamlit run app.py<\/code><\/p>\n\n\n\n<p>Your browser will open automatically at http:\/\/localhost:8501. Upload a PDF, type a question, and watch your RAG chatbot answer from the document.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Common Errors and Fixes<\/strong><\/h2>\n\n\n\n<p>Even when you follow every step carefully, a few issues tend to come up. Here are the most common ones and how to handle them.<\/p>\n\n\n\n<p><strong>allow_dangerous_deserialization<\/strong><strong> error:<\/strong> This appears when loading a saved FAISS index. Add allow_dangerous_deserialization=True to your load_local() call \u2014 it&#8217;s safe when you&#8217;re loading your own saved files.<\/p>\n\n\n\n<p><strong>Model downloads are taking too long:<\/strong> HuggingFace downloads models on the first run. This is normal. Once cached, subsequent runs are fast. You can also pre-download models using huggingface-cli download.<\/p>\n\n\n\n<p><strong>Answers are too short or incomplete:<\/strong> Increase max_new_tokens in your pipeline. Try values between 256 and 1024, depending on the model and the type of answers you expect.<\/p>\n\n\n\n<p><strong>Out of memory errors:<\/strong> Switch to a smaller model or reduce your chunk size. For CPU-only machines, flan-t5-base is the safest option.<\/p>\n\n\n\n<p><strong>Retrieval returning irrelevant results:<\/strong> Try adjusting k in the retriever. Also, experiment with your chunk size; sometimes, smaller chunks (around 300 tokens) improve retrieval precision.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Taking It Further<\/strong><\/h2>\n\n\n\n<p>Once your base chatbot is working, there are several directions you can take it:<\/p>\n\n\n\n<ul>\n<li><strong>Add conversation memory<\/strong> using LangChain&#8217;s ConversationBufferMemory so the chatbot remembers earlier messages in the session<\/li>\n\n\n\n<li><strong>Support multiple file types<\/strong> by adding loaders for .txt, .docx, and .csv files<\/li>\n\n\n\n<li><strong>Deploy to the cloud<\/strong> using Streamlit Community Cloud, which offers free hosting for Streamlit apps<\/li>\n\n\n\n<li><strong>Switch to a more powerful model<\/strong> like Mistral or <a href=\"https:\/\/www.guvi.in\/blog\/how-to-use-code-llama\/\" target=\"_blank\" rel=\"noreferrer noopener\">LLaMA 2<\/a> for noticeably better answer quality<\/li>\n\n\n\n<li><strong>Add authentication<\/strong> if you&#8217;re building this for internal team use<\/li>\n<\/ul>\n\n\n\n<p>Each of these improvements takes your chatbot closer to a production-ready tool.<\/p>\n\n\n\n<p>If you\u2019re serious about building RAG applications with premium AI tools and want to apply them in real-world scenarios, don\u2019t miss the chance to enroll in HCL GUVI\u2019s <strong>Intel &amp; IITM Pravartak Certified<\/strong><a href=\"https:\/\/www.guvi.in\/mlp\/artificial-intelligence-and-machine-learning\/?utm_source=blog&amp;utm_medium=hyperlink&amp;utm_campaign=rag-chatbot-with-huggingface-and-streamlit\" target=\"_blank\" rel=\"noreferrer noopener\"><strong> Artificial Intelligence &amp; Machine Learning Course<\/strong><\/a>, co-designed by Intel. It covers Python, Machine Learning, Deep Learning, Generative AI, Agentic AI, and MLOps through live online classes, 20+ industry-grade projects, and 1:1 doubt sessions, with placement support from 1000+ hiring partners.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>In conclusion, building a RAG chatbot with HuggingFace and Streamlit is one of the most practical ways to apply AI to real documents and real workflows. You&#8217;ve now covered the full pipeline, from loading and chunking documents to generating embeddings, running retrieval, and presenting everything through a clean chat interface.<\/p>\n\n\n\n<p>The real power of RAG isn&#8217;t just the technology, it&#8217;s the ability to make AI genuinely useful for your specific data. As open-source models continue to improve, building tools like this will only get more accessible and more powerful.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>FAQs<\/strong><\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1775707700519\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>1. What is a RAG chatbot?<\/strong>\u00a0<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>A RAG chatbot is an AI system that retrieves relevant information from your documents before generating a response. This makes it more accurate than a standard chatbot that relies only on pre-trained knowledge.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1775707702546\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>2. Do I need a GPU to build this?<\/strong>\u00a0<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>No. This tutorial uses flan-t5-base and all-MiniLM-L6-v2, both of which run on CPU. Responses may be slower, but it works without any special hardware.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1775707707917\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>3. What file types can I use with this chatbot?<\/strong>\u00a0<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>In this tutorial, we use PDFs. LangChain supports many other formats including .txt, .docx, .csv, and web pages with minimal changes to the loader.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1775707713420\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>4. Is HuggingFace free to use?<\/strong>\u00a0<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Yes. All models used in this tutorial are freely available on HuggingFace Hub and can be downloaded and run locally at no cost.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1775707720024\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \"><strong>5. What is FAISS and why do we use it<\/strong><\/h3>\n<div class=\"rank-math-answer \">\n\n<p>FAISS (Facebook AI Similarity Search) is a library for fast vector similarity search. It lets you find the most relevant document chunks for any given query in milliseconds, even with large document sets.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>If you&#8217;ve ever wished a chatbot could actually know your documents, not just give generic answers, that&#8217;s exactly what RAG makes possible. Most language models are powerful, but they&#8217;re limited to what they learned during training. They don&#8217;t know about your company&#8217;s internal policies, your research papers, or your product documentation. RAG fixes this by [&hellip;]<\/p>\n","protected":false},"author":22,"featured_media":112691,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[933],"tags":[],"views":"1306","authorinfo":{"name":"Lukesh S","url":"https:\/\/www.guvi.in\/blog\/author\/lukesh\/"},"thumbnailURL":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2026\/04\/Feature-image-8-300x116.webp","_links":{"self":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/106414"}],"collection":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/users\/22"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/comments?post=106414"}],"version-history":[{"count":8,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/106414\/revisions"}],"predecessor-version":[{"id":119288,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/106414\/revisions\/119288"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media\/112691"}],"wp:attachment":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media?parent=106414"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/categories?post=106414"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/tags?post=106414"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}