Building AI prototypes as an indie developer taught me three non-obvious lessons that challenge what I thought I knew about working with LLMs.

Own your hardware. Do not depend solely on cloud providers.

You can achieve usable performance with your own hardware for a lot less than vendors like NVIDIA would like you to believe. In many cases older hardware works just fine. My main development box is an older AMD-based machine (Ryzen 7 2700X CPU) with a used Gigabyte RTX 3090 GPU. It’s a little slow loading larger models but given how infrequently I tend to swap out models it’s acceptable. For reference, loading OpenAI’s gpt-oss-20b model quantized to MXFP4 (12.1GB on disk) takes ~25 seconds. Inference is very fast, consistently running 50-60 tokens/second. And I still have enough VRAM left over to load 1-2 smaller models. Not bad for a $800 hardware purchase.

Aside from the cost savings, local hardware opens up an entire ecosystem of models that cloud providers don’t offer. If you’re exploring niche use cases like creative text generation, gaming, role playing, or tool use on resource constrained platforms – just to name a few possibilities – you might find someone has already fine tuned a model you can start using immediately.

LLMs make a good general purpose ML computing substrate (sometimes).

Modern LLMs can substitute for dedicated ML solutions during prototyping. Yes, it’s inefficient for production, but it validates ideas in hours instead of days.

I recently experienced this prototyping the long term memory system for my personal agent project. I needed curated context for the model including recent interactions, vector db search results (aka RAG), and more targeted conceptual searching. To perform the latter I needed to do typed entity recognition with importance scoring. Now, I could invest a couple of days figuring out how to build it using libraries like spaCy and NLTK and then have to figure out how to integrate that into my agent’s Go codebase.

What I did instead was spend half an hour refining a system prompt instructing OpenAI’s gpt-oss-20b model how to do it instead. I wrapped calling the model using langchaingo in a Go package and 💥 I have a working prototype. I tested its output using a random set of 200 sentences I pulled from some Project Gutenberg training data I happened to have. Then I scored the 200 sentences myself and compared the results. 95% of the time we got the same results. The differences in the last 5% were negligible. Is the sky a natural object or a place? Is love an emotion or abstract concept? The edge cases were ambiguous anyway. Consistently calling the sky a place instead of a natural object doesn’t break anything.

Knowing basic algebra and vector math is essential.

I think at this point in the hype cycle everyone is aware of the math-heavy nature of AI. This doesn’t mean you have to be a math wiz to use the tech effectively. If you have a few years of programming experience, knowledge of basic algebra, and familiarity with 2D and 3D math – vectors, normalization, dot- and cross-products – you have all the knowledge you need to understand most of the modern AI landscape.

I realized this recently while working with embeddings and the vec2text as I was exploring AI as a source of inspiration for my creative writing. Very briefly, an embedding is a vector representation of a sequence of tokens in a space with lots of dimensions, typically at least 768. Converting text to an embedding then is mapping tokens to coordinates in this highly dimensional embedding space. vec2text reverses this process. Once you realize embeddings are vectors then it becomes obvious you can use math operations to alter the text in semantically significant ways.

I’ll use a character from one of my stories as an example. This character had a rough childhood. Their mother periodically abandoned them for long periods of time at a young age. Let’s say they have a real episodic memory of My mother wasn’t home that night or the next or the next. At 8 years old I wound up learning how to cook meals for me and my brothers. Now, like many abuse survivors, their brain has created post-traumatic cognitively distorted memories in an attempt to rationalize and accept their experiences. This could manifest as a statement like I had an easy childhood growing up. My mother was warm and caring. She was kind and patient and always had an encouraging word for everyone.

By treating memories as vectors, I could model how trauma gradually rewrites survivors’ memories. I wrote a Python script which calculated the distance and direction of the cognitive distortion from the base reality and used that to generate statements conceptually between the real memory and the distorted one. Below is the output lightly edited for readability.

## Interpolated 0% (real memory)
My mother wasn't home that night or the next or the next. At 8 years old I wound up learning 
how to cook meals for me and my brothers.

## Interpolated 20%
It was easy for my mother to come home and teach me how to cook the next night. I was 8 years 
old and growing up with my brothers.

## Interpolated 50% (half real, half distortion)
My mother was a warm and caring mother. I grew up easy when I was 8 and taught my brothers 
how to cook.

## Interpolated 80%
I had an easy childhood growing up. My mother was always warm and caring. She was a patient 
and encouraging person.

## Interpolated 100% (distorted memory)
I had an easy childhood growing up. My mother was always warm and caring. She was kind and 
patient and had a an encouraging word for everyone.

## Interpolated 150% (very distorted memory)
My mother grew up with an easy childhood. My mother was always warm and encouraging. She had 
a positive outlook and always caring and patient.