Setting Up a Local AI: A Weekend Experiment

June 9, 2024

I recently spent a weekend going down an AI rabbit hole. The idea was sparked by learning that it was possible set up an AI Large Language Model (LLM) to run locally, using a tool called Ollama that significantly simplifies the process.

What?

My weekend fascination was with AI began when I learned of Daniel Meissler's fabric framework, which has interesting use cases such as extracting the important wisdom from articles and videos. The other main component that made me realise just how simple setting up my own pet AI had become was ollama. Ollama is a tool that abstracts all the complicated parts of setting up a LLM into a simple command to download a model and expose a local API.

I started by reading up on these tools, I read far more than necessary, but it was all interesting nonetheless. I should mention that I also ended up using another awesome Ollama integration, Obsidian Copilot, more on that later.

Why?

At this point, I should mention why I wanted my own local AI. The main reason is that, although tools like fabric and Obsidian Copilot work well with API keys for commercial LLMs like ChatGPT or Anthropic's Claude, I wanted the benefit of privacy.

Using Obsidian Copilot, I would be asking the AI about my personal notes, which I didn't want to be sending off to any server that I didn't control. Also, I didn't want to be paying API fees when I could use my local AI for free (well, free of direct costs anyway).

Ollama setup

The main task was to set up a locally running LLM on my computer. I actually didn't set it up on my main computer, as I mostly use a Framework laptop with no dedicated GPU. Luckily, I have another computer which does have a decent NVIDIA graphics card, and Ollama exposes a simple HTTP API that I could easily make use of over my local network.

The actual setup of Ollama was quite easy. I set it up on a Windows computer, so the entire installation process was downloading the official .exe and running it. It felt a bit too easy, but I now had an Ollama daemon running on my computer.

As for actually setting up the LLM, this is where Ollama shines. I went with Meta's llama3 model, which is freely available, designed for general AI assistance tasks and scores well in benchmarks. As my computer only had 32GB of RAM, I went with the smaller 8 Billion parameter model, rather than the gigantic 70B version.

The actual install was one command in Command Prompt: ollama run llama3. A few minutes of downloading later and I had an interactive chat AI running in the command window. But I wasn't stopping there, I wanted access to AI from my Obsidian notes, my web browser and more.

Connecting to an Ollama server

I mentioned before that my main computer is a Framework laptop. I actually run Linux (Mint OS if you must know) as I find Windows too annoying. But my Ollama server was on a different machine, which, as it turns out, was not much of a barrier at all.

Ollama exposes a HTML API out of the box. Just go to localhost:11434 in a browser to see “Ollama is running”. All I needed to do was follow the Ollama FAQ and open the server to my local network by changing the OLLAMA_HOST environment variable. I was now good to go.

Of course I did a few quick tests using curl in my terminal, but I needed a smoother way to interact with my “pet” AI.

Ollama integrations – fabric and Page Assist

The first integration that I wanted to use was fabric. Unfortunately after install I was having issues connecting it to Ollama over the network. Normally I would keep trying things until it worked, but I knew that fabric was being overhauled to run in Go rather than Python with release due in only a few weeks, so I decided to wait for the new version and move on with other integrations.

One simple integration was Page Assist, a browser extension that can connect to a local Ollama server, including one running over the network. All I had to do was install the Firefox extension (A Chrome plugin is also available), put my Ollama IP address in the settings and it was up and running.

The main feature of Page Assist is that it has a nice clean UI to chat with my AI, but it does even more than that. It can use the current webpage as context, allowing me to ask my AI to summarise webpages or describe their content.

It can also perform web searches and use the results to form its answers. It does this by using Retrieval Augmented Generation (RAG), which requires a different LLM to create embeddings, translating the content into vectors that are stored and added to the prompt when relevant.

Luckily, it was very easy to set up an embedder LLM with Ollama: ollama pull nomic-embed-text.

Page Assist was now all set up, ready for general queries, processing web pages and searching the web for answers. However, I wanted to be able to easily use the AI on my notes, which is where Obsidian Copilot comes in.

Using Obsidian Copilot with Ollama

For those who don't know, Obsidian is essentially a notes app where all notes are just linked text files, formatted with markdown. This means that all my notes are ready to be input into a text-based LLM, with the possibility of powerful functionality.

Obsidian Copilot makes this integration simple, providing not just a chat window, but also integrating options to work on specific notes, manipulate highlighted text or use RAG to answer questions based on a whole vault of notes.

Installation of Obsidian Copilot was again very easy. I just browsed the community plugins in Obsidian settings and installed it. I then just had to point it at my ollama server in the settings, for both the main LLM model and the embedding model for RAG.

A few more tweaks were needed, namely setting Olllama's origin policy and expanding its context window so that it could work on more input at once, but I only had to follow a few simple instructions to complete the setup.

With Obsidian Copilot installed and connected to Ollama, I could now prompt my local AI with commands based on my highlighted text, any note in my vault or use RAG to ask questions based on my entire Zettelkasten of notes.

Of course, I didn't want to stick to the default prompts available, like summarising text or changing its tone, so I explored the custom prompts options that Obsidian Copilot provides. I actually based some of my custom prompts on those found in the fabric framework, such as summarising an article in a structured format, or improving the grammar of selected text. I found many powerful ways to get more out of my own notes, or text copied into Obsidian.

Ollama on my phone

Before the weekend was over, there was one more method of talking to my “pet” AI that I wanted to setup. I had found an Android app simply named Ollama App. All I had to do was download it on my phone, install it (I already had installation of non-playstore apps enabled) and point it to my local Ollama server.

I currently only works while I am at home, as I obviously have not exposed my Ollama server to the public internet. However, a simple VPN such as Wireguard running on my home NAS (TrueNAS Scale if you are interested) would allow me to access my local LLM from anywhere.

Conclusion

The weekend was now over and I had succeeded. I now had a local LLM which I could use from my web browser, my notes app and my phone, with powerful integrations to make use of my own private content.

Sure, I could just use ChatGPT, but many of these uses would require connecting to the API, which isn't free, also perhaps more importantly, this keeps all my data locally on servers that I control.

That was my weekend, I just felt like writing about it after going down that rabbit hole for two straight days. At least I have some useful tools to show for it.

P.S This was written by me, my AI only contributed a little bit of feedback.

By @nicholasspencer@infosec.exchange My LinkedIn