With AI-assisted code, I’ve been wondering what’s more optimal and also more enjoyable. On one hand, it feels great when AI generates code that, with just a couple of tweaks, works well enough for me to start iterating. But it often misses edge cases, which can lead to a lot of frustration. For example, just yesterday, I was using the new ChatGPT o1 model with Cursor (which is basically a clone of Visual Studio Code that uses LLMs for code assistance). I was bouncing between Cursor for AI changes and Emacs for making bigger manual edits. I had some AI-generated code that made a simple video by grabbing an OpenGL buffer and encoding it using ffmpeg. Things were going smoothly until, all of a sudden, the code just stopped working.
I spent hours scratching my head, trying to figure out what went wrong. Turns out, the AI-generated code had a sneaky bug; it wasn’t flushing the decoder at the end, so a couple of seconds of the video were missing. It was the kind of bug that’s easy to miss, especially in my AI-generated podcasts, where I didn’t even notice that 2 seconds were cut off in a 40-minute episode. But yesterday, I was doing something else, and that missing bit mattered.
There’s this sweet instant gratification when the AI spits out something that works right away, but then, the debugging hits hard. It’s like a rollercoaster. I thought, “This is it! Humanity can relax; AI is here to take over.” But nah, it’s not a silver bullet. I still ended up spending too much time debugging instead of just writing it myself. I keep thinking there must be a balanced approach, but I haven’t quite figured out where to draw the line yet.
To be honest, I prefer writing code that’s already solid, bug-free (if that’s even possible), and easy to debug. I like adding tons of debug output logs, safety checks, error handling—things that AI doesn’t really think about. Debugging is just… not fun. I want the code to work right from the start, not to spend hours playing detective on an AI’s half-baked output.
In the end, I’m torn. I love not having to write all the boilerplate code, but the constant back-and-forth between Cursor and Emacs, tweaking AI-generated code, then manually fixing it—it’s a lot. It’s fun, but also frustrating, and I’m not sure if AI-assisted coding is where it needs to be just yet.
The washing machine broke. 😦
Note: alpha = opacity; 1 is equivalent to full opacity, 0 is fully transparent.
All happy families are alike; each unhappy family is unhappy in its own way. – Leo Tolstoy
I always wanted to be happy my way, but maybe Leo is right.
Found on YouTube a good tune; I thought that song was a banger. I checked it out, and it had 50 views. Wait a minute; no way. I got a bit suspicious; somebody just reposted the song, and the official version has 500,000 views. For a moment, I thought I found a diamond in the rough, nope.
OMG! Sony! How come X is accept and O is cancel?!
I asked ChatGPT to create a program that can generate automated podcasts, and it wrote it in Python. It worked for a while, but now I’m getting a bit irritated. Of course, the program wasn’t perfect from the first run, and ChatGPT handled the big changes well.
However, for small tweaks, I’ve been making manual edits, which is why I need a good text editor. Going back and forth with ChatGPT for minor adjustments is time-consuming. I could probably start using Cursor, but I’m used to Emacs, and it will take time to get accustomed to Cursor, especially for basic operations like searching or navigating within and between files.
The most irritating part is that Python doesn’t help with variable names—it’s all handled at runtime. I constantly make typos, and in C++, the compiler catches them. But here, I run the program, the podcast generation takes quite a long time, and after 10-20 minutes it crashes because some variable names were misspelled.
How do Python developers deal with that?
Working with o1 preview is like working with an intern who knows all languages in the world, but has 0 experience in software engineering. I have to be very explicit with my intent.
It took a whole day trying to piece things together, essentially re-onboarding without the help from others; I was embarrassed to ask for help a second time. I probably should admit that I remember nothing, and I need help.
SVN checkout takes so much time; how come this old technology is still alive when Git is everywhere?
Some YOASOBI songs sound like K-Pop, for example ‘Biri-Biri’ 🤔
I have a hard time recalling relatively recent events, such as those from 2-3 months ago. I was onboarded to one of the projects and spent about 3 weeks playing around with it, and then I got sidetracked to another project. Now I’m trying to get back, but I have no memory of anything. I remember it was easy, but nothing else. What is it? Is it an effect of long COVID?
It is fascinating how software development has changed with the current state of AI. I had an idea for an automated podcast, and after about four hours, it is completely functional.
I have an idea for a fully automated podcast. I’m trying to get into podcasting and realizing it requires a lot of work. I need to schedule time, then edit, which takes about 5 hours to edit a 1-hour podcast.
What if I write a program that generates a podcast based on my microblog messages? On average, I post about 10 messages on my blog each week. I could automatically grab them from the Redis database using the Redis CLI to fetch messages from the past week. Then, I could use the OpenAI API to ask GPT-4 to generate a transcript.
After that, I can use OpenAI’s TTS capabilities to generate the audio. For visuals, I could create a simple video with two bubbles—one representing me and the other for the co-host—bouncing in sync with the audio. I’ll use FFmpeg to generate the video. GPT-4 can also generate the title and show notes.
The entire process could run on a cron job scheduled to execute automatically at night between Tuesday and Wednesday, uploading directly to YouTube. The only manual step would be reviewing the podcast and marking it public.
YouTube, can you remove the scroll feature on the full screen video? It is so annoying.
I have a bit of a surprise. I came up with a simple idea: make an LLM play a text-based game—classic Zork. For those unfamiliar, Zork is one of the earliest interactive fiction games, originally developed in the late 1970s. It’s a text-based adventure where the player inputs commands to navigate through a mysterious underground world, solve puzzles, collect treasures, and overcome obstacles like trolls or locked doors. There are no graphics—just descriptions of environments, objects, and events that the player interacts with using simple text commands such as “go north,” “examine lantern,” or “attack troll.” It’s considered a classic but can be quite challenging, as it requires logical thinking and careful exploration.
On the surface, it seemed simple to implement, and indeed it was. However, my LLaMA 3.1 model with 8B parameters struggled a lot to make any meaningful progress in the game.
I quickly rewrote the program using OpenAI’s GPT-4o, and it performed much better, though it came at a cost—it was burning $1 per hour. The cost would have been higher, but I made the app read the output aloud using TTS so I could follow the progress without needing to read everything manually.
Before switching to GPT-4o, I spent the entire weekend tuning prompts with LLaMA 3.1 8B, hoping to at least get it to defeat the troll in the basement. Instead, it kept walking in circles through the forest or struggling to turn on the lantern.
Trying to save money, I decided to play the game manually by copying and pasting commands into ChatGPT. Of course, GPT-4o had no problems. I also tried GPT-4o mini, but it ran into similar issues as my smaller LLaMA 3.1 model. In contrast, the full GPT-4o breezed through the game, handling the inventory, defeating the troll, and exploring the dungeon with no trouble.
Meanwhile, Meta’s LLaMA 3.1 model with 405B parameters got lost in the forest, and Gemini got stuck on the examine command. Zork doesn’t provide much information on examining objects, and Gemini ended up gaslighting me, suggesting that I might have a buggy version of the game or was making typos in the commands.
Claude performed reasonably well, but I didn’t explore much further with it because I was tired of copy-pasting commands endlessly. I also tried LLaMA 3.1 with 70B parameters. Since it runs painfully slow on my laptop, I set it up and lay on the bed watching YouTube on my Quest 3 for two hours. When I came back, I found the model stuck in a ridiculous loop, still trying to open a door that was nailed shut.
TL;DR: I’m quite surprised that LLMs struggle so much with a seemingly simple game. Maybe one day they’ll finally get past the troll—until then, I’ll stick to YouTube.
Making commits where the number of deletions is higher than the number of insertions is satisfying.
129 insertions(+), 177 deletions(-)
Something’s wrong with the gaming if you need to wait 4 hours to download a game despite having a 1 Gbps internet connection.
🌸 Wed, Oct 9, 2024 🌸
My long COVID condition is not improving; in fact, it’s getting worse. I became too confident last week and pushed a bit harder with exercises and my game development hobbies. A huge wave of phantom tobacco smell is keeping me awake, and it’s bothering me too much. I can’t sleep.
🌸 Tue, Oct 8, 2024 🌸
Lost internet. Kind of unusual.
🌸 Tue, Oct 8, 2024 🌸
I was helping my son rip a CD, and we stumbled upon software called Nero Burning ROM. I used to use software with a similar name about 15 years ago, and it just occurred to me that the name comes from the Roman Emperor Nero, who allegedly burned Rome.
Days I did not think about the Roman Empire: 0.
🌸 Tue, Oct 8, 2024 🌸
Current internet is FUBAR; it’s time for a new internet.
🌸 Tue, Oct 8, 2024 🌸
I’ll pause on upgrading to iOS 18, as it looks like they have some battery issues.
🌸 Tue, Oct 8, 2024 🌸
I asked ChatGPT to make jokes about the Roman Empire:
- The only time I think about the Roman Empire is when I’m trying to remember if it came before or after dinosaurs.
- Thinking about the Roman Empire is like flossing: I know I should do it more often, but life just gets in the way.
- The Roman Empire fell, but at least they didn’t have to deal with group chats.
- The real reason Rome fell? Too many gladiators calling in sick.
- If Rome had YouTube, their tutorials would all be ‘How to Build an Empire and Lose It in 3 Easy Steps.’
- If Rome had emojis, their favorite would be the thumbs up—just like the gladiator matches!
- If Rome had text messaging, Caesar would’ve received a lot of ‘Watch your back’ DMs.
- If Rome had food delivery apps, every order would come with a free side of ‘bread and circuses.’
- If Rome had TikTok, all the gladiators would be doing sword dance challenges in front of the Coliseum.
- If Rome had customer service, every complaint about invasions would get the response: ‘We’re expanding our coverage, please be patient.’
🌸 Tue, Oct 8, 2024 🌸
Wait, what? Analytics is a paid service on X now? I do not look at analytics often, and I feel it is wrong to charge for it.
Previous - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - 20 - 21 - 22 - 23 - 24 - 25 - 26 - 27 - 28 - 29 - 30 - 31 - 32 - 33 - 34 - 35 - 36 - 37 - 38 - 39 - 40 - 41 - 42 - 43 - 44 - 45 - 46 - 47 - 48 - 49 - 50 - 51 - 52 - 53 - 54 - 55 - Next