{"id":84348,"date":"2025-07-30T11:41:59","date_gmt":"2025-07-30T06:11:59","guid":{"rendered":"https:\/\/www.guvi.in\/blog\/?p=84348"},"modified":"2025-08-16T18:47:25","modified_gmt":"2025-08-16T13:17:25","slug":"fine-tuning-llms-with-unsloth-and-ollama","status":"publish","type":"post","link":"https:\/\/www.guvi.in\/blog\/fine-tuning-llms-with-unsloth-and-ollama\/","title":{"rendered":"Fine-Tuning LLMs with Unsloth and Ollama: A Step-by-Step Guide"},"content":{"rendered":"\n<p>Ever wished you could make a language model work exactly the way your application demands, without relying on expensive cloud APIs or off-the-shelf limitations? That\u2019s where fine-tuning LLMs comes in. <\/p>\n\n\n\n<p>In this step-by-step guide, we\u2019ll walk through how to fine-tune a large language model using Unslotted, then run it locally using Ollama. Whether you&#8217;re working with structured outputs or domain-specific data, this hands-on approach gives you full control over your LLM\u2019s behavior.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>I. Introduction to Fine-Tuning LLMs<\/strong> <\/h2>\n\n\n\n<p>Fine-tuning is the process of adapting a pre-trained language model to perform better on a specific task by retraining it on task-relevant data. Think of it like training a skilled chef on your restaurant&#8217;s specific menu rather than teaching someone to cook from scratch.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Key Differences<\/strong><\/h3>\n\n\n\n<ul>\n<li><strong>Fine-tuning<\/strong> retrains the model using new data.<br><\/li>\n\n\n\n<li><strong>Parameter tuning<\/strong> adjusts behavior (e.g., temperature, top_k) without altering the model\u2019s weights.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>II. When Should You Fine-Tune?<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-1200x630.png\" alt=\"When Should You Fine-Tune\" class=\"wp-image-85062\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/01@2x-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<p>Fine-tuning becomes valuable when:<\/p>\n\n\n\n<ul>\n<li>You need outputs in a <strong>specific format<\/strong> (e.g., structured JSON).<br><\/li>\n\n\n\n<li>You work with <strong>domain-specific data<\/strong> (e.g., medical records).<br><\/li>\n\n\n\n<li>You want <strong>cost-effective<\/strong> models that perform well without relying on large-scale LLMs.<br><\/li>\n\n\n\n<li><strong>Trade-off:<\/strong> Fine-tuned models are more specialized and may lose general-purpose versatility.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>III. Practical Implementation with Unsloth<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1200\" height=\"630\" src=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-1200x630.png\" alt=\"Practical Implementation with Unsloth\" class=\"wp-image-85063\" srcset=\"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-1200x630.png 1200w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-300x158.png 300w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-768x403.png 768w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-1536x806.png 1536w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-2048x1075.png 2048w, https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/08\/02@2x-150x79.png 150w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" title=\"\"><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Step-by-Step Setup Using Google Colab<\/strong><\/h3>\n\n\n\n<p>Complete code and datasets are available at <a href=\"https:\/\/github.com\/BASILAHAMED\/LLM-Fine-Tuning.git\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/BASILAHAMED\/LLM-Fine-Tuning.git<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Import datasets and Install Unsloth<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import json\n\nfile = json.load(open(\"json_extraction_dataset_500.json\", \"r\"))\n\nprint(file&#91;1])\n\n# install unsloth and other dependencies\n\n!pip install unsloth trl peft accelerate bitsandbytes<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Verify GPU Access<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import torch\n\nprint(f\"CUDA available: {torch.cuda.is_available()}\")\n\nprint(f\"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}\")<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Load Model Using Unsloth<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from unsloth import FastLanguageModel\n\nmodel_name = \"unsloth\/Phi-3-mini-4k-instruct-bnb-4bit\"\n\nmax_seq_length = 2048\n\ndtype = None\n\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n\n&nbsp;&nbsp;&nbsp;&nbsp;model_name=model_name,\n\n&nbsp;&nbsp;&nbsp;&nbsp;max_seq_length=max_seq_length,\n\n&nbsp;&nbsp;&nbsp;&nbsp;dtype=dtype,\n\n&nbsp;&nbsp;&nbsp;&nbsp;load_in_4bit=True,\n\n)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Format the Dataset<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from datasets import Dataset\n\ndef format_prompt(example):\n\n&nbsp;return f\"### Input: {example&#91;'input']}\\n### Output: {json.dumps(example&#91;'output'])}&lt;|endoftext|&gt;\"\n\nformatted_data = &#91;format_prompt(item) for item in file]\n\ndataset = Dataset.from_dict({\"text\": formatted_data})<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>5. Apply LoRA Adapters<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>model = FastLanguageModel.get_peft_model(\n\n&nbsp;&nbsp;&nbsp;&nbsp;model,\n\n&nbsp;&nbsp;&nbsp;&nbsp;r=64,\n\n&nbsp;&nbsp;&nbsp;&nbsp;target_modules=&#91;\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n\n&nbsp;&nbsp;&nbsp;&nbsp;lora_alpha=128,\n\n&nbsp;&nbsp;&nbsp;&nbsp;lora_dropout=0,\n\n&nbsp;&nbsp;&nbsp;&nbsp;bias=\"none\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;use_gradient_checkpointing=\"unsloth\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;random_state=3407,\n\n&nbsp;&nbsp;&nbsp;&nbsp;use_rslora=False,\n\n&nbsp;&nbsp;&nbsp;&nbsp;loftq_config=None,\n\n)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>6. Train the Model<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from trl import SFTTrainer\n\nfrom transformers import TrainingArguments\n\ntrainer = SFTTrainer(\n\n&nbsp;&nbsp;&nbsp;&nbsp;model=model,\n\n&nbsp;&nbsp;&nbsp;&nbsp;tokenizer=tokenizer,\n\n&nbsp;&nbsp;&nbsp;&nbsp;train_dataset=dataset,\n\n&nbsp;&nbsp;&nbsp;&nbsp;dataset_text_field=\"text\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;max_seq_length=max_seq_length,\n\n&nbsp;&nbsp;&nbsp;&nbsp;dataset_num_proc=2,\n\n&nbsp;&nbsp;&nbsp;&nbsp;args=TrainingArguments(\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;per_device_train_batch_size=2,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;gradient_accumulation_steps=4,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;warmup_steps=10,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_train_epochs=3,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;learning_rate=2e-4,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fp16=not torch.cuda.is_bf16_supported(),\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bf16=torch.cuda.is_bf16_supported(),\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;logging_steps=25,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;optim=\"adamw_8bit\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;weight_decay=0.01,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;lr_scheduler_type=\"linear\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seed=3407,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output_dir=\"outputs\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;save_strategy=\"epoch\",\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;save_total_limit=2,\n\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dataloader_pin_memory=False,\n\n&nbsp;&nbsp;&nbsp;&nbsp;),\n\n)\n\ntrainer_stats = trainer.train()<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>7. Run Inference<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>FastLanguageModel.for_inference(model)\n\nmessages = &#91;\n\n&nbsp;&nbsp;&nbsp;&nbsp;{\"role\": \"user\", \"content\": \"Extract the product information:\\n&lt;div class='product'&gt;&lt;h2&gt;iPad Air&lt;\/h2&gt;&lt;span class='price'&gt;$1344&lt;\/span&gt;&lt;span class='category'&gt;audio&lt;\/span&gt;&lt;span class='brand'&gt;Dell&lt;\/span&gt;&lt;\/div&gt;\"},\n\n]\n\ninputs = tokenizer.apply_chat_template(\n\n&nbsp;&nbsp;&nbsp;&nbsp;messages,\n\n&nbsp;&nbsp;&nbsp;&nbsp;tokenize=True,\n\n&nbsp;&nbsp;&nbsp;&nbsp;add_generation_prompt=True,\n\n&nbsp;&nbsp;&nbsp;&nbsp;return_tensors=\"pt\",\n\n).to(\"cuda\")\n\noutputs = model.generate(\n\n&nbsp;&nbsp;&nbsp;&nbsp;input_ids=inputs,\n\n&nbsp;&nbsp;&nbsp;&nbsp;max_new_tokens=256,\n\n&nbsp;&nbsp;&nbsp;&nbsp;use_cache=True,\n\n&nbsp;&nbsp;&nbsp;&nbsp;temperature=0.7,\n\n&nbsp;&nbsp;&nbsp;&nbsp;do_sample=True,\n\n&nbsp;&nbsp;&nbsp;&nbsp;top_p=0.9,\n\n)\n\nresponse = tokenizer.batch_decode(outputs)&#91;0]\n\nprint(response)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>8. Export in GGUF Format for Ollama<\/strong><\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>model.save_pretrained_gguf(\"gguf_model\", tokenizer, quantization_method=\"q4_k_m\")\n\nimport os\n\nfrom google.colab import files\n\ngguf_files = &#91;f for f in os.listdir(\"gguf_model\") if f.endswith(\".gguf\")]\n\nif gguf_files:\n\n&nbsp;&nbsp;&nbsp;&nbsp;gguf_file = os.path.join(\"gguf_model\", gguf_files&#91;0])\n\n&nbsp;&nbsp;&nbsp;&nbsp;print(f\"Downloading: {gguf_file}\")\n\n&nbsp;&nbsp;&nbsp;&nbsp;files.download(gguf_file)<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>IV. Running the Fine-Tuned Model with Ollama<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Steps:<\/strong><\/h3>\n\n\n\n<ol>\n<li>Create a new directory and move the .gguf file into it.<br><\/li>\n\n\n\n<li>Inside that directory, create a file named <strong>Model file<\/strong>.<br><\/li>\n\n\n\n<li>Add the following to the file (replace &lt;model_name&gt;.gguf):<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>from .\/&lt;model_name&gt;.gguf\n\nparam_top_p 0.9\n\nparam_temperature 0.2\n\nstop user\n\nstop end_of_text\n\ntemplate \"&lt;|im_start|&gt;user\\n{{.Prompt}}&lt;|im_end|&gt;\\n&lt;|im_start|&gt;assistant\\n{{.Response}}&lt;|im_end|&gt;\\n\"\n\nsystem \"You are a helpful AI assistant.\"<\/code><\/pre>\n\n\n\n<ol start=\"4\">\n<li>Run the model<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>ollama create &lt;model_name&gt; -f Model file\n\nollama run &lt;model_name&gt;<\/code><\/pre>\n\n\n\n<p>In case you want to explore more on Artificial Intelligence and Machine Learning, consider enrolling for GUVI\u2019s <a href=\"https:\/\/www.guvi.in\/zen-class\/artificial-intelligence-and-machine-learning-course\/?utm_source=organic+&amp;utm_medium=blog&amp;utm_campaign=fine-tuning-llms\" target=\"_blank\" rel=\"noreferrer noopener\">Artificial Intelligence and Machine Learning Course<\/a>, which teaches everything related to it with an industry-grade certificate!&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>In conclusion, fine-tuning with Unsloth and deploying via Ollama isn\u2019t just a cost-saving move\u2014it\u2019s a power move. You get a lightweight, task-optimized model running securely on your own machine. From structured JSON extraction to domain-specific reasoning, this setup lets you push your LLM workflows further, faster, and without the vendor lock-in.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ever wished you could make a language model work exactly the way your application demands, without relying on expensive cloud APIs or off-the-shelf limitations? That\u2019s where fine-tuning LLMs comes in. In this step-by-step guide, we\u2019ll walk through how to fine-tune a large language model using Unslotted, then run it locally using Ollama. Whether you&#8217;re working [&hellip;]<\/p>\n","protected":false},"author":48,"featured_media":85061,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[933],"tags":[],"views":"8190","authorinfo":{"name":"Basil Ahamed","url":"https:\/\/www.guvi.in\/blog\/author\/basil-ahamed-s\/"},"thumbnailURL":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Fine-Tuning-LLMs-with-Unsloth-and-Ollama_-A-Step-by-Step-Guide-300x116.png","jetpack_featured_media_url":"https:\/\/www.guvi.in\/blog\/wp-content\/uploads\/2025\/07\/Fine-Tuning-LLMs-with-Unsloth-and-Ollama_-A-Step-by-Step-Guide.png","_links":{"self":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/84348"}],"collection":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/comments?post=84348"}],"version-history":[{"count":7,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/84348\/revisions"}],"predecessor-version":[{"id":85065,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/posts\/84348\/revisions\/85065"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media\/85061"}],"wp:attachment":[{"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/media?parent=84348"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/categories?post=84348"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.guvi.in\/blog\/wp-json\/wp\/v2\/tags?post=84348"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}