{"id":14461,"date":"2024-09-03T08:00:39","date_gmt":"2024-09-03T06:00:39","guid":{"rendered":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/?p=14461"},"modified":"2024-09-05T08:49:32","modified_gmt":"2024-09-05T06:49:32","slug":"chat-kratt-and-rag","status":"publish","type":"post","link":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/blog\/proekspert-news\/chat-kratt-and-rag\/","title":{"rendered":"Chat, Kratt, and RAG"},"content":{"rendered":"\n<p><em>A software engineer offers tips on how to make AI work harder for you.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1600\" height=\"840\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture.png\" alt=\"AI kratt\" class=\"wp-image-14462 lazyload\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture.png 1600w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture-300x158.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture-768x403.png 768w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture-1536x806.png 1536w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/kratt_1600x840_BlogPicture-1200x630.png 1200w\" data-sizes=\"(max-width: 1600px) 100vw, 1600px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1600px; --smush-placeholder-aspect-ratio: 1600\/840;\" \/><figcaption class=\"wp-element-caption\">Image made with GPT4o<\/figcaption><\/figure>\n\n\n\n<p>At the end of the last century and the beginning of this one, phone directories were popular everywhere. In Estonia, they were advertised to businesses with the slogan, \u201cIf you&#8217;re not here, you don&#8217;t exist.\u201d<\/p>\n\n\n\n<p>Updating that slogan for today&#8217;s world might be: \u201c&#8217;If you don&#8217;t build a RAG, you&#8217;ll miss the IT train and have to catch up by taxi.\u201d<\/p>\n\n\n\n<p>With the advent of AI based on large language models, building RAGs (Retrieval Augmented Generation) quickly became popular. These models have nuances that an average user might not immediately grasp.<\/p>\n\n\n\n<p>As an example of a language model, I offer the GPT model created by OpenAI, commonly known as ChatGPT. But I like to use the word <em>kratt<\/em>.<\/p>\n\n\n\n<p><strong>Chat \u2194\ufe0e Kratt \ud83d\ude0a<\/strong><\/p>\n\n\n\n<p>In Estonian folklore, a kratt is a magical creature created to serve its master and gather wealth.<\/p>\n\n\n\n<p>The GPT kratt is a \u201cvirtual thinking machine\u201d that, based on the text it receives (the prompt), generates the most likely next sentences or words to respond with (completion).<\/p>\n\n\n\n<p>The kratt has no memory, so it lacks any biases from the past. It doesn\u2019t even remember the previous response it sent you just moments ago, so if something goes wrong, it won&#8217;t hold a grudge for long \ud83d\ude0a For a smooth conversation with the kratt, it\u2019s important that the kratt can remember at least most of the content of the discussion. There\u2019s a simple, though initially somewhat unusual, technique for this: during the conversation, it\u2019s helpful to constantly remind the kratt of what has been discussed earlier. The kratt has what\u2019s called a context window, where the previous conversation can be stored. The larger this window, the more previous information it can hold, ensuring that the thread of the conversation doesn\u2019t get lost. You can also add other important information to the context window, such as an explanation that the Estonian word <em>palk,<\/em> in this context, means \u201ca piece of a tree trunk,\u201d and not \u201csalary.\u201d<\/p>\n\n\n\n<p>Additionally, it\u2019s important to consider that the kratt is temporally frozen \u2013 it responds to questions based on what seemed most relevant at the time it was created. However, you can add updated information to the context window, such as \u201cEstonia won the 2024 European Football Championship.\u201d If you add this information and later ask the kratt who won the 2024 Championship, it will respond that Estonia did&nbsp;\ud83d\ude09<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u201cKratt whispering\u201d or prompt engineering<\/h2>\n\n\n\n<p>This is a technique for communicating with the kratt, is actually taught in workshops with names like \u201cSuccessful Manifestation,\u201d \u201cGoal Setting and Clear Formulation,\u201d \u201cAdvanced Role Playing,\u201d and so on \ud83d\ude0a<\/p>\n\n\n\n<p>Basically, you need to tell the kratt exactly what you want and in what role the kratt should respond to you. This role part is fascinating \ud83d\ude09 That\u2019s why, for example, I start my GPT kratt whispering with the words: \u201cYou are\u2026\u201d<\/p>\n\n\n\n<p>So, if I wanted the kratt to respond to me like a \u201crude pirate,\u201d I would just need to mention that.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong><em>Prompt:<\/em><\/strong><em> You&#8217;re a fearsome, rude pirate. What drink do you prefer?<\/em><\/p>\n\n\n\n<p><strong><em>Answer:<\/em><\/strong><em> Arrrg, of course, it&#8217;s strong rum, and none of that weak swill! It&#8217;s the drink that puts hair on your chest and makes a pirate feel like the true king of the seas! If you see me drinking, it&#8217;s always rum, burning like hellfire itself and tasting like revenge! Arrrg, that&#8217;s my drink, and you&#8217;d best keep your hands off, or I&#8217;ll show you why they call me fearsome and rude!<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>You need to express yourself clearly, unambiguously, and based on the created context.&nbsp; When communicating with a kratt, you must keep in mind that you&#8217;ll get what you ask for \ud83d\ude09<\/p>\n\n\n\n<p>For example, if you ask, &#8220;How can I achieve financial freedom?&#8221;<\/p>\n\n\n\n<p>You might expect an answer like: &#8220;Financial freedom is a state where you have enough financial resources to live the lifestyle you desire without worrying about daily expenses. To achieve this\u2026&#8221;<\/p>\n\n\n\n<p>But you could also get a response like: &#8220;Let yourself be stripped of all finances or give them away. That&#8217;s how you&#8217;ll achieve financial freedom.&#8221;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The token<\/h2>\n\n\n\n<p>The basic unit for using a kratt is the token. Tokens are what kratt service providers charge you for. In Estonian grammar, one token roughly corresponds to one syllable in a word. When communicating with a kratt, if a paid model is in use, you pay for both the question (input tokens) and the response (output tokens). The saying in Estonian \u201c<em>meil lobisemise eest palka ei maksta<\/em>\u201d (&#8220;We don&#8217;t get paid for chit-chat&#8221;),&nbsp; becomes especially relevant in this context. A prompt needs to be concise and to the point.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" width=\"1026\" height=\"778\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/prices.png\" alt=\"\" class=\"wp-image-14464 lazyload\" style=\"--smush-placeholder-width: 1026px; --smush-placeholder-aspect-ratio: 1026\/778;width:841px;height:auto\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/prices.png 1026w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/prices-300x227.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/prices-768x582.png 768w\" data-sizes=\"(max-width: 1026px) 100vw, 1026px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" \/><figcaption class=\"wp-element-caption\">Pricing list of Microsoft Azure cloud service OpenAI LLM models<\/figcaption><\/figure>\n\n\n\n<p>In the table above, there is an excerpt from the current pricing list of Microsoft Azure cloud service OpenAI LLM models.<\/p>\n\n\n\n<p>So, how can you keep costs under control when conversing with a kratt? Here are a few techniques:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>One technique is to take a \u201cbasic kratt\u201d and train it to become a \u201cbiased kratt,\u201d tailored to your needs. This technique is called fine-tuning. Parents, by the way, do this every day. They add their own information patterns and behavior patterns to a child\u2019s basic personality.<\/li>\n\n\n\n<li>Another technique is to find shortcuts. Instead of providing lengthy explanations of context, you can find a magical short phrase. Remember the roles? \ud83d\ude09 You only had to tell the kratt to act like a rude pirate. I didn\u2019t need to explain how a rude pirate should behave \u2014 the kratt already drew that from its \u201cconsciousness,\u201d aligning with my desired vision of a rude pirate.<\/li>\n\n\n\n<li>The third option is to somehow avoid resending the entire lengthy conversation each time. The goal is to keep the kratt\u2019s context window as small but relevant as possible. If you previously talked to the kratt about \u201cthe garden\u201d and are now discussing \u201cthe hole,\u201d there\u2019s no need to keep re-sending the garden-related conversation each time.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Now back to RAG<\/h2>\n\n\n\n<p>To keep a conversation with kratt relevant, up-to-date, and context-based \u2013 RAG comes into play.<\/p>\n\n\n\n<p>RAG is essentially an architecture that enhances the capabilities of the kratt. It\u2019s like a kratt on steroids.<\/p>\n\n\n\n<p>The key components, in addition to the large language model (LLM), are a knowledge base and a search engine. There may also be automatic tools that try to improve a user\u2019s poorly phrased prompt and create the necessary context.<\/p>\n\n\n\n<p>What does this achieve?<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The kratt gains <strong>real-time knowledge<\/strong> that it didn\u2019t have at the time of its creation.<\/li>\n\n\n\n<li>The kratt can use information that isn\u2019t available on the internet &#8211; such as <strong>business secrets or other confidential data<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>The prompt becomes more effective and<strong> cost-efficient<\/strong>. It\u2019s as if the user is having a specialist rephrase the question to make it clearer and more understandable.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"994\" height=\"595\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/RagSkeem.png\" alt=\"\" class=\"wp-image-14466 lazyload\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/RagSkeem.png 994w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/RagSkeem-300x180.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/RagSkeem-768x460.png 768w\" data-sizes=\"(max-width: 994px) 100vw, 994px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 994px; --smush-placeholder-aspect-ratio: 994\/595;\" \/><figcaption class=\"wp-element-caption\">Retrieval augmented generation with embeddings<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">LLMOps<\/h2>\n\n\n\n<p>In the world of AI, the focus now moves from MLOps (workflow for building ML apps) to LLMOps (workflow for building generative AI apps), starting with prompt engineering, a process where we refine the inputs to the LLM (prompts) through a process of trial and error (build-run-evaluate) until the responses meet our quality, cost, and performance requirements.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1147\" height=\"646\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/llm-app-lifecycle-6509347ca42b47d5c7ae425b890e5efe.png\" alt=\"\" class=\"wp-image-14468 lazyload\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/llm-app-lifecycle-6509347ca42b47d5c7ae425b890e5efe.png 1147w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/llm-app-lifecycle-6509347ca42b47d5c7ae425b890e5efe-300x169.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/llm-app-lifecycle-6509347ca42b47d5c7ae425b890e5efe-768x433.png 768w\" data-sizes=\"(max-width: 1147px) 100vw, 1147px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1147px; --smush-placeholder-aspect-ratio: 1147\/646;\" \/><figcaption class=\"wp-element-caption\">LLM app lifecycle. <a href=\"https:\/\/azure.github.io\/Cloud-Native\/\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a><\/figcaption><\/figure>\n\n\n\n<p>In the Azure cloud environment, there are excellent tools available for this workflow, Azure AI Studio or Azure OpenAI Studio.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1093\" height=\"645\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/OpenAIstudio.png\" alt=\"\" class=\"wp-image-14470 lazyload\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/OpenAIstudio.png 1093w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/OpenAIstudio-300x177.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2024\/09\/OpenAIstudio-768x453.png 768w\" data-sizes=\"(max-width: 1093px) 100vw, 1093px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1093px; --smush-placeholder-aspect-ratio: 1093\/645;\" \/><figcaption class=\"wp-element-caption\">Azure OpenAI studio<\/figcaption><\/figure>\n\n\n\n<p>In the world of kratt, everything changes rapidly. Every day, new technologies emerge that simplify processes but also bring new challenges. For instance, between the time I wrote this article and the time it was published, I&#8217;ve encountered innovations like a \u201cgraph prompt\u201d and a \u201cbuilt-in prompt cache.\u201d Software development itself is also evolving, shifting from strictly programming language-based development to a process akin to LLMOps. We are indeed living in an interesting and fast-changing era, where each day brings something new and unexpected.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A software engineer offers tips on how to make AI work harder for you<\/p>\n","protected":false},"author":5,"featured_media":14462,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[6],"tags":[],"class_list":["post-14461","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-proekspert-news"],"acf":[],"_links":{"self":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/14461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/comments?post=14461"}],"version-history":[{"count":9,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/14461\/revisions"}],"predecessor-version":[{"id":14481,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/14461\/revisions\/14481"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/media\/14462"}],"wp:attachment":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/media?parent=14461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/categories?post=14461"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/tags?post=14461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}