diff --git a/ai.md b/ai.md deleted file mode 100644 index 8d4f33d..0000000 --- a/ai.md +++ /dev/null @@ -1,76 +0,0 @@ -# context of what I am trying to do -my interest in LLM is mostly around technical / computer stuff, technical writing, and professional development. Example of things I am interested in: -* apply advanced configurations to my router (configure different nets with firewal rules, VPN gateways, site to site VPNs, etc) -* various home lab endeavors (media servers, VPN endpoints, etc). -* home automation. Set up smart stuff in my house. monitor them. Example set up my z-wave thermostat with home assistant, set up rules, help monitor my power bills, etc. -* technical writing. When wrapping up a project I would like the AI to document it, tell me what's missing in my documentation trail. For example I would like to make updating this website a much easier step when working on my projects, especially when I have interesting results. -* wrap up unfinished projects resembling the ones described here. -* Vibe code some stuff. Example: I want a little app to keep track of my HSA receipts where I could just to to myapp.mydomain.com, and the app would have one feature: scan a receipt and parse it / ask me to specify medatadata. THen it would maintain the results in a database, maintain the pictures of receipts when I need to make a claim. That's just as example that's meant to be achieavable but not trivial. We'll see where it takes me. - -# learinng -My objective is to understand how LLMs work. In particular I want to gain better understanding of: -* when they work well - * see my example applications -* when they tend to fail - * what I can do about that -* How to get more predictable results - * in this case I would like to improve both consistency: different output not contradicting each other on similar input, and quality (more relevant and insightful output) -* Reduce instances of LLM's simply ignoring my instructions: "For the 4th time: use UV when running python, DO NOT set a virtual environment yourself". -* how LLMs work at the mathematical level (I used to be a legit statistician after all!) - * I want to understand embeddings, what they mean. be able to make sense of the variout transformer steps, be able to do 1-2 iteration on paper and understand what I'm achieving. Understand encoders only vs decoder only, model heads, etc. -* Why LLMs sometimes cycle between telling you something and the complete opposite -* How to manage context windows effectively -* Identify when I'm failing at using models vs using the wrong model vs having bad information. -* Identify when my model is good enough, but outdated (does not "know" about breaking changes in a software) -* evaluate the difference between paid-for and "local" models. -* evaluate the differences between public free models - * ... and probably paid for models too since they are what I use professionally. - * evaluate against consistency and quality of the output. does claude suggest a more complete answer when asked to "do task XYZ" and does it doing consistently on the same input. -* how can I maintain my privacy doing all this. - * from my IPS, my employer, my family, hackers, the government, google, my job. - * for example imagine I am trying to set up a VPN to conceal from the government that _Facebook_ (or _The Onion_, same thing really) is my main source of information, I don't think my government or my wife should know about this. - * not to mention. I'd like to be able to supply an API key to the AI when asking it to configure something, with certainly that it's handling this properly. In the case of public models I don't have much hope I will be able to do this, but I should be able to do this on my LAN. -* prompt engineering. I want to become really good at getting the LLM to do what I want, as cheaply and as quickly as possible. -* learn how to deal with HUGE contexts (very large code bases, very large documents, etc) - * huge here meaning tasks currently exceeding what models can do. At the time of writing context windown of the models that I am using can stretch to 100K-200K tokens. What if I'm trying to sort a code base with 2M lines of code. What if I'm parsing GBs of logs to find trends (ok AI may not be the first tool for this) - -# Motivation -To be 100% honest, my motivation is part **curiosity** (I am a very curious person, I like gardening, math, geology, history, psychology, etc), part **continued employment**. I don't shine at worker faster or using the coolest tools. I am, however, very capable of thinking critically, **asking why** even when it bothers everyone else. AI will likely disrupt the technology landscape, but it really opening a bunch of opportunities for us all. We have to figure out what they are. -* In general, software / tech engineers don't work the same way that they did 5 years ago. Five years ago I was working on reprojecting maps in tile servers. At the time of writing I am developing data pipelines for robotic applications. Who knows what the future holds for me, but I want to be ready to be a survivor of the AI disruption. - -# hardware -I am blessed to have two Nvidia GPU on hand (RTX 3090, A30), each with 24GB of RAM, and enough spillover DDR5 RAM (96GB). Due to limitations of consumer-grade GPU, I cannot pool their memory, but I can run a sizable collection of models with this amount of memory. - -My plan is to accept the speed penalty when swapping memory to run much larger, allegedly better models, so I can see what they are really capable of. For example, to test how to get **reproducible and valid** answers on really complex queries. I don't mind waiting 5/10 minutes on a very big model as a test. - -example complex queries: -* multistep queries requiring information retrieval: "figure out how to set up a separate net on my router, with a VPN gateway -- you need to figure out how to set up the wireguard interface --. Make sure the net will never spill traffic on the other gateways. make sure I can test that the net work. Also you need to be able to work with a DNS entry for the remote endpoint, and not a static IP -- if the router does not support it find a way around it" -* look at this massive email chain that I got and help me figure out the mood of the client, and what should be the next step. Retrieve information from my code base / git / some system. if it can be relevant to understand what's doing on - -I am also blessed to have access to an Nvidia RTX 5070 GPU with 12GB of VRAM that I use for my work at Forterra. I don't do too much after hours development, but sometimes it is the only computer I have access to. and I can test an idea during my lunch. Yes: I obtained the blessing of our cybersecurity team to do this! - -What I really want to do is play with the software and the hardware. In particular I want to run these models locally as much as possible. I don't object using commercial models either. - - -# evaluation -I want to get a better mental picture how to evaluate output, and also how to manage the vast output that LLMs give while maintaining my sanity. I am not trying to understand everything (honestly if the AI writes a wicker parser for a crappy file format that I need to parse, an long as I understand the spec I'm not trying to micromanage everything). However I want to remain on top of it so I can keep iterating **and be confident** that the output is good. In my experience this is not easy at any rate, and not with AI. - -* this is kinda open ended by definition. I don't know yet what to do with this. example AI is outputing tons of text. Am I supposed to save it? save when I input to get it. How do I keep trace of it. What's worthy of saving it and what's not? - -# NOT trying to do -I am not trying (yet) to do a whole lot of agent integration. I'm fine talking to a python script for now. Also I am not trying, yet, to start a company or create anything novel. I am also not that interested in scaling just yet -- I may be later but my objective is mastery of the tool, not productizing my learing. - -I want, however, to be able to use reliably what I create in my home lab, for my own uses. Maybe my wife can tap the resources that I create for her own projects as well. - -# AI SUGGESTIONS -* RAG (retrieval-augmented generation) -- you're already building this with vyosindex/ChromaDB but it's not listed as an explicit learning objective. Chunking strategies, embedding models, vector similarity, and retrieval quality are worth studying deliberately. -* fine-tuning vs RAG vs in-context learning -- when to use each, and the tradeoffs. Directly relevant to your "outdated model" bullet. -* quantization and model formats -- you plan to run big models on consumer hardware. Understanding quantization levels and their impact on quality vs speed vs memory will save you a lot of trial and error. -* reproducibility -- you mention wanting reproducible answers. Worth digging into seeds, temperature settings, and the fundamental non-determinism of GPU floating point. Even temperature=0 doesn't guarantee identical outputs across runs. -* hallucination detection and grounding -- goes beyond "when they fail." Specific strategies for verifying outputs, forcing citations, and grounding responses in source material. -* cost analysis -- token pricing for API models, electricity/time cost for local inference, and when local vs cloud actually makes economic sense for a given task. -* security and prompt injection -- ties into your privacy bullet. Understanding what "private" really means when running locally vs sending data to an API, and how prompt injection attacks work. -* the math curriculum -- you say you want to understand the math. Worth listing the subtopics: attention mechanism, transformer architecture, tokenization, loss functions, backpropagation, softmax, positional encoding. That's a curriculum in itself and you should scope it. -* evaluation frameworks -- you mention wanting to evaluate output quality. There are both formal benchmarks and task-specific eval approaches worth knowing about, even if you don't use them directly. -* multi-modal models -- vision, audio, code. You have the hardware. Worth deciding whether this is in scope or explicitly out of scope for now. -* model selection heuristics -- how to quickly decide which model to reach for given a task, without trial and error every time. Ties into your "wrong model" bullet. \ No newline at end of file diff --git a/site/content/llm/_index.md b/site/content/llm/_index.md new file mode 100644 index 0000000..88b5dff --- /dev/null +++ b/site/content/llm/_index.md @@ -0,0 +1,6 @@ ++++ +date = '2026-04-03T23:42:21Z' +draft = false +title = 'LLM' +tags = [] ++++ diff --git a/site/content/llm/learning-motivations.md b/site/content/llm/learning-motivations.md new file mode 100644 index 0000000..dd46b9b --- /dev/null +++ b/site/content/llm/learning-motivations.md @@ -0,0 +1,132 @@ ++++ +date = '2026-04-03T23:43:34Z' +draft = false +title = "Why I'm Learning LLMs" +tags = ['llm', 'learning', 'motivation', 'AI-reviewed'] ++++ + +## Motivation + +Part **curiosity** — I like gardening, math, geology, history, psychology, and now this. Part **continued employment**. I don't shine at working faster or adopting the coolest tools. I am, however, capable of thinking critically and **asking why**, even when it bothers everyone else. + +AI will disrupt the technology landscape. It's also opening opportunities. We have to figure out what they are. + +Software engineers don't work the way they did five years ago. Five years ago I was reprojecting maps in tile servers. Today I develop data pipelines for robotic applications. I want to be ready for whatever comes next. + +## What I Want to Learn + +My objective is to understand how LLMs work. Specifically: + +### When they work, when they don't + +- When they tend to fail — and what I can do about it +- Why they sometimes cycle between telling you something and the complete opposite +- See the example applications below for the kind of tasks I have in mind + +### Getting predictable results + +- Improve **consistency**: different outputs should not contradict each other on similar input +- Improve **quality**: more relevant, more insightful output +- Reduce instances of the model flat-out ignoring instructions ("For the 4th time: use UV when running Python. DO NOT set a virtual environment yourself.") +- Learn to manage context windows effectively + +### Diagnosing problems + +- Tell apart: me failing at using the model vs picking the wrong model vs feeding bad information +- Identify when the model is capable but outdated — doesn't "know" about breaking changes in a library, for example + +### Comparing models + +- Evaluate paid vs local models +- Evaluate free public models (and paid ones too, since that's what I use professionally) +- Measure consistency and quality: does Claude give a more complete answer than Llama for task X, and does it do so reliably on the same input? + +### The math + +I used to be a legit statistician. I want to understand LLMs at the mathematical level: + +- Embeddings: what they mean, how they're computed +- Transformer steps: be able to trace 1–2 iterations on paper and understand what I'm achieving +- Encoders only vs decoders only, model heads +- Enough depth to build real intuition, not just hand-waving + +### Prompt engineering + +Get good at getting the model to do what I want, as cheaply and quickly as possible. + +### Dealing with scale + +Large contexts — very large code bases, very large documents. "Large" meaning tasks that exceed current model capacity. At the time of writing, models I use stretch to 100K–200K tokens. What about a codebase with 2M lines? What about parsing GBs of logs for trends? (AI may not be the first tool for that last one.) + +### Privacy + +- From my ISP, my employer, my family, hackers, the government, Google, my job +- Example: suppose I'm setting up a VPN to keep my government from knowing that _Facebook_ (or _The Onion_ — same thing really) is my main news source. Neither my government nor my wife should know about this. +- I want to supply an API key to the AI when asking it to configure something, with confidence it's handled properly. With public models I don't expect this. On my LAN, I should be able to achieve it. + +## What I Want to Build + +My interest in LLMs centers on technical work, technical writing, and professional development. Concrete examples: + +- **Router configuration** — advanced setups: separate networks with firewall rules, VPN gateways, site-to-site VPNs +- **Home lab** — media servers, VPN endpoints, related projects +- **Home automation** — Z-Wave thermostat with Home Assistant, rules, power bill monitoring +- **Technical writing** — have the AI document finished projects, identify gaps in my documentation trail, make updating this website easier when I have interesting results +- **Finish stalled projects** — ones resembling the above +- **Vibe code** — example: a small app to track HSA receipts. Go to `myapp.mydomain.com`, scan a receipt, parse it, attach metadata, store results and images in a database for claims. Achievable but not trivial. We'll see where it takes me. + +## Hardware + +I have two Nvidia GPUs on hand: + +| GPU | VRAM | +|-----|------| +| RTX 3090 | 24 GB | +| A30 | 24 GB | + +Plus 96 GB of DDR5 RAM for spillover. Consumer-grade GPU limitations prevent pooling the GPU memory, but I can run a sizable collection of models with this setup. + +My plan: accept the speed penalty of memory swapping to run larger, allegedly better models and see what they're actually capable of. I don't mind waiting 5–10 minutes on a big model as a test — especially for **reproducible and valid** answers on complex queries. + +Examples of complex queries: + +- *Multi-step with information retrieval*: "Set up a separate network on my router with a VPN gateway. Figure out the WireGuard interface config. Make sure traffic never spills to other gateways. Make sure I can test it. Handle a DNS entry for the remote endpoint instead of a static IP — if the router doesn't support it, find a workaround." +- *Messy real-world context*: "Read this massive email chain. Help me gauge the client's mood and decide the next step. Pull context from my codebase, git history, or some other system if it helps." + +I also have access to an RTX 5070 (12 GB VRAM) through my work at Forterra. I don't do much after-hours development, but sometimes it's the only machine available and I can test an idea during lunch. (Yes — I got approval from our cybersecurity team.) + +What I really want is to play with the software and the hardware. Run models locally as much as possible. I don't object to commercial models either. + +## Evaluating Output + +I want a better mental picture of how to evaluate LLM output — and how to manage the volume of text these models produce without losing my mind. + +I'm not trying to understand everything. If the AI writes a parser for a crappy file format and I understand the spec, I'm not going to micromanage it. But I want to stay on top of things so I can keep iterating **and stay confident** the output is good. In my experience, this is hard in general — and not easier with AI. + +- This is open-ended by definition. The AI outputs tons of text. Am I supposed to save it? Save what I fed it? How do I keep track? What's worth keeping and what's not? + +## What I'm NOT Doing + +- Not pursuing heavy agent integration yet. I'm fine talking to a Python script for now. +- Not starting a company or creating anything novel. +- Not focused on scaling — my objective is mastery of the tool, not productizing my learning. + +I do want to use what I build reliably in my home lab. Maybe my wife can tap into the resources too. + +--- + +## Potential Leads Going Forward + +> *The ideas below were suggested by AI (Claude) based on the goals above. They're not commitments — just threads worth pulling on.* + +- **RAG (retrieval-augmented generation)** — I'm already building this with vyosindex/ChromaDB, but it's not listed as an explicit learning objective. Chunking strategies, embedding models, vector similarity, and retrieval quality are worth studying deliberately. +- **Fine-tuning vs RAG vs in-context learning** — when to use each, and the tradeoffs. Directly relevant to the "outdated model" question. +- **Quantization and model formats** — running big models on consumer hardware means understanding quantization levels and their impact on quality, speed, and memory. +- **Reproducibility** — seeds, temperature settings, and the fundamental non-determinism of GPU floating point. Even `temperature=0` doesn't guarantee identical outputs across runs. +- **Hallucination detection and grounding** — goes beyond "when they fail." Specific strategies: verifying outputs, forcing citations, grounding responses in source material. +- **Cost analysis** — token pricing for API models, electricity and time cost for local inference, and when local vs cloud makes economic sense. +- **Security and prompt injection** — what "private" really means locally vs through an API, and how prompt injection attacks work. +- **The math curriculum** — subtopics worth scoping: attention mechanism, transformer architecture, tokenization, loss functions, backpropagation, softmax, positional encoding. That's a curriculum in itself. +- **Evaluation frameworks** — formal benchmarks and task-specific eval approaches, even if not used directly. +- **Multi-modal models** — vision, audio, code. The hardware supports it. Worth deciding if this is in or out of scope. +- **Model selection heuristics** — how to quickly pick the right model for a task without trial and error every time.