I have always had a strange obsession with the media we use to store music. Back in elementary school I loved using our home tape recorder to make mix tapes. It left me with so many good memories. These days my dream is to own my own cassette Walkman.

But this is not easy to pull off. Walkmans are expensive and come with plenty of headaches. (My birthday wish this year was that some kind friend would gift me one. But I know you would call me shameless for even thinking it.)

To fill that deep consumerist void, half a year ago I settled for buying a Fiio Echo Mini. It is an MP3 player designed to look like a mini cassette machine.

This Device Is Pretty Bad

After using it for a while I developed some complicated feelings. It is decent in some ways, but the overall execution is terrible.

The most painful thing to look at is that large pixelated screen. Fifteen years ago you might have considered it an average display. But in 2026, staring at that screen just feels odd. Even if the screen is pixelated, good visual design (like authentic retro pixel art) could have turned it into an advantage. The real problem with this device's design is its awful UI. The moment you power it on, a distorted face resembling an old man squinting at his phone on the subway slaps you right in the face.

I keep complaining to my friends that the designers must be young kids. The mistakes they made in design execution are ridiculously bad. Every single icon is completely unaligned to the pixel grid, so everything looks blurry.

In case you do not know, back when older designers were working, screen resolutions were low. To keep images sharp and clear they would avoid placing points on half-pixels that cause sub-pixel smoothing. The problem here is that many icons are very small. They blur into indistinguishable blobs. You stare at them forever with your head full of question marks: What the hell is this supposed to be? The most absurd part is that while the icons are blurry, the text is rendered with crisp pixel fonts.

For example, when you are upgrading the firmware the screen shows an interface drawn with ugly Zhongyi Black and Song fonts. The prompt text is not even centered on the screen! In daily use, if you switch the language to French you will notice that accented letters and regular letters have different sizes. All the characters end up at uneven heights.

There is something even more surreal. All the bitmap UI assets look as if they were exported in a lossy format, converted to BMP, and then embedded into the firmware. Every texture has a mysterious white outline around the edges between light and dark areas. The details are filled with compression artifacts and noise. In short, combined with the mediocre physical controls, using the device becomes physically and mentally uncomfortable.

Of course it has its good points. It looks cute and it is very small. You can carry it around as a social conversation piece. My favorite feature is that it has two headphone jacks! One balanced output and one standard 3.5mm jack. They can even work simultaneously! It reminds me of sharing a single pair of headphones with friends back in school. Now with this device you can share music with three other friends at once. Four people jamming together is hilarious.

There are also some even crazier ways to use it.

As we all know, Sony noise-cancelling headphones basically crank the bass to the maximum and blast your head while the high frequencies sound muddy and unpleasant. I happen to have a pair of flathead IEMs (the YINCROW model I recently bought). They handle highs well but lack bass. So the question became: Can I let one handle the highs and the other handle the lows?

To boost the bass I set Clear Bass all the way to +10. The flatheads ran without any extra EQ.

When I wore both headphones at once (flatheads inside my ears with the XM3s over them) and took turns unplugging one, something interesting happened. Unplugging the Sony made the sound instantly thinner. The guitar harmonics were still there but the low-frequency body resonance disappeared. Unplugging the flatheads made everything turn muddy as the Sony's dullness became obvious. When both played together, real music finally appeared.

But this setup has its costs. The output phases do not match, which makes the sound feel particularly dead. It is like swimming in a concrete pool. The sense of space suffers greatly.

The atmosphere is like this: The hardware is good but the software is bad. And the badness is very consistent and complete. Not only is the UI design bad; the engineering practices are also bad. I will talk about that later.

Later on I got so annoyed (actually I was not that angry, just had an idea) that I started trying to mod the firmware.

Firmware Modding

First Exploration

I have almost no experience with reverse engineering. The last time I did anything like this was in junior high and I have forgotten everything. My highest level of experience with "reverse engineering" is looking at minified JavaScript code. This kind of work is probably not something a non-professional can handle.

But we have large language models, right?

I tried asking GLM 4.7 to take a look at the general shape of the firmware. It actually managed to analyze some useful information.

At the very beginning I told it: "Here is a firmware. My goal is to extract the bitmaps inside. Tell me how to do it."

It told me it needed capstone, ghidra, r2pipe, rzpipe, and angr. Installing the Python bindings for these on NixOS was a bit troublesome, but I eventually got it set up.

Then it started scanning the binary file by itself. Before long it had identified the chip model and SDK model. It even found the partition table. It tried reading the instruction sequences, discovered some kind of firmware encryption, and wrote a correction program (this becomes very important later).

After that it climbed into the embedded file system inside the file and pulled out all the .bmp files. When I looked, the images really matched what I expected.

But extracting them came with problems. The BMP color encoding just would not match. This was where I had to step in. After all it has no eyes, but I do. I was responsible for looking while it kept trying different parameters. In the end we managed to figure out the correct decoding parameters together.

Font Extraction

Because it uses pixel fonts, the simplest assumption was that the font storage is a binary sequence where 0 is blank space and 1 is the part with the character. So I took out my phone and photographed the Chinese characters on the screen. Then I started counting the pixels one by one and typing them into a text document as search samples. I have to thank the large pixels on this font. I could clearly see every single dot.

I gave GLM 4.7 the characters I had extracted and told it: "Just these few characters. Try to scan for them."

The result was surprisingly good. By trying various padding methods it actually located where the pixel font data was stored.

Although we found the glyph file, we did not know the data structure or where the index table was. This began a long period of groping in the dark. There was one very annoying issue: Every few characters there would be tearing. The left half would shift upward a bit while the right half shifted downward. It happened every other character and we had no idea why.

We had no choice but to look at the text rendering logic in the firmware. Since we already knew where the characters were, we explored around those addresses and eventually found something. In less than half an hour it located the rendering function. Then we climbed upward step by step until we found the rendering formula.

But this still could not explain the image tearing. To solve this we even started suspecting issues at the software-hardware communication protocol level. We built a mini rendering simulator involving VOP DMA, but we still could not find the problem.

Something had to be wrong. I was stuck in a dead end, thinking it was a problem with our extraction algorithm.

The tearing pattern was clear: left half shifted up, right half shifted down, happening every other character. So I asked GLM 4.7 to find the offset pattern. One very annoying thing kept happening. It would often output large amounts of ASCII art. Even when the characters were clearly distorted it would confidently tell me that they were correct and complete Chinese characters.

But the characters were obviously torn apart.

LLMs are text-based models. They can only read data line by line horizontally. What they see is strings made of dots and hashes. They can guess from statistical patterns that something "looks like a Chinese character." But they have no two-dimensional vision at all. What they see is completely different from what human eyes see. When you ask it to judge if a glyph is correct, it does not give you visual judgment. It gives you a probability prediction, and that prediction is very easy to get wrong.

Also, it loves statistical modeling. It thinks "80% match" is better than "50% match" even if what it is comparing is not the same thing at all. At that point you have to patiently explain that for this task there is only completely correct or completely wrong. There is no "half correct" situation.

This happened over and over, more than ten times. Every time I said "Are you sure this is a Chinese character? It is torn and the shape is wrong. It looks like multiple characters stitched together," it would apologize, re-analyze, the context would get compressed, important lessons would be lost in bad compression prompts. I would explicitly tell it "You need to read column by column, not row by row." It would listen, then after a while automatically revert to row-reading mode. Tell it again, and it reverts again. Your instructions are only temporary overrides, not real changes to how it works.

During this process I lost my temper several times and started swearing. Afterward I realized this was a very specific problem: Once you inject emotion into the conversation it stays in the context. It even persists through compression (but important methodological lessons do not). It keeps polluting future reasoning quality. After swearing at it, it started putting a lot of energy into calming my emotions instead of solving the technical problem. It tried every way to avoid triggering my feelings and eventually did nothing. It was very similar to a human fight-or-flight response. The result was that reasoning got weaker, I got more annoyed, and it became a vicious cycle.

This is a very practical lesson: When collaborating with LLMs, your emotional state is an engineering variable, not a private matter.

Later I started a fresh context and asked the LLM to create a visualization tool that mapped rendering results to ROM addresses. I looked at the blocks one by one and confirmed with the new context. It asked me: "Where did you get this firmware file from?" I showed it the "correction script." It told me that the correction script had messed up all my analysis. It had interpreted the firmware incorrectly and corrupted the data structures. That was why the character tearing appeared.

At this point the mystery was solved.

Reverse Engineering the Character Lookup System

So far we had only found characters in the CJK range. We knew nothing about other regions. We saw large areas of pure zero blank data between planes, then a continuous segment of data, more large blanks, then another big chunk of data. The LLM said the data was stored in blocks and cleverly avoided explaining where all those pure zero areas came from.

According to the overall logic, if data was stored in segments then there should be a table. It started generating all kinds of hallucinations trying to force the "lookup table" hypothesis to fit.

As you know, models are extremely bad at math. I gave it several sample characters and it tried to "find patterns" but kept telling me there were none and that it must be done via table lookup. So I collected more samples. I even generated several rows of characters using CJK code points, had the machine render them, took photos, and used Gemini Canvas to write two tools that generated a huge number of samples for it to search.

One tool was a binary sequence reconstruction tool. You feed it a photo slice, it automatically divides it into 16x15 grids, generates a histogram for each cell, adds a draggable separator line to mark black on the left and white on the right, and performs automatic binarization. After that I only needed to take photos, upload them, and adjust the separator line. The glyphs would be automatically traced out.

The other tool: Given a bitmap, it automatically generates a character table for the 100 Unicode code points before and after it. Clicking a character copies its code point. This was because I could not find a good Unicode code point lookup website at the time, so I built one myself.

But it kept making forced interpretations in the wrong context. Until I exported all the scanned data and threw it to Gemini Canvas to do a linear regression fit and sent the formula back...

Hey! It still would not listen! Because this did not match the instruction sequence logic it saw. It said the calculated width was 32 but the actual rendering was 33. This issue stalled us for two full days.

Fortunately I have good habits when working with LLMs. Every time I reach a research result I immediately output a Markdown note, and I maintain the entire document library with some care. So I fed the entire document library to NotebookLM and pasted the full problem context into it as well. NotebookLM spotted it immediately: The compiler had optimized multiplication by 33 into "left shift 5 bits plus original value" (x << 5 + x). Because shift operations are faster than multiplication, compilers automatically make this replacement. GLM had not recognized this pattern when analyzing the decompiled result, so it kept interpreting the parameter as 32.

Here I learned two important things. Complex tasks need multiple models to discuss together. A single model can easily fall into a knowledge blind spot. In fact Grok recently added an integrated multi-model discussion mode. It looks quite interesting.

Context Management

By this stage one obvious problem became clear: Once your context window gets large and contains too much stuff, the model becomes stupid.

At that point the various documents, logs, and code snippets we had generated were already several thousand lines long. It started talking nonsense, complaining nonstop, and executing in a very rigid step-by-step manner.

The accumulated documents kept growing longer and longer. If a single document could eat up half the context, the entire research could not continue. It was time for house cleaning.

Structured Archiving

The first step was to throw all accumulated documents at Claude at once and ask it to organize them into a tree directory structure. Each knowledge point became its own document. Documents referenced each other, and there was an index document as the entry point. Later when facing a specific problem I only needed to feed the directory plus one or two relevant small files to the model instead of the entire research history. Context consumption dropped dramatically and the model could run longer and more accurately.

But this only solved the existing document problem. GLM has a bad habit: If you ask it to remember one document it will secretly create three. Your document library quickly turns into a garbage dump. So using additional models to maintain the document library is very important (though I later discovered NotebookLM does this more efficiently).

Problem Tree

Simple archiving was not enough because it only handles "known knowledge." During reverse engineering the problems themselves grow dynamically.

I maintained a special problem tree document. It was not a simple todo list. It recorded the dynamic evolution of the entire problem space. A big problem would spawn several sub-problems during research. Solving part of a sub-problem would expose new unknowns. Those new unknowns would split into even more specific small problems. This tree fully recorded the fission process.

This structure brought two benefits. First, it provided a high-dimensional abstract description of the problem space. At any time you could clearly see "where we are now, what we still do not know, and what the current biggest bottleneck is" instead of drowning in scattered technical details. Second, and more importantly, it greatly helped keep the context clean.

Specifically: Every time I started a new push I did not stuff all research documents into the model. I only pasted the current state of the problem tree. Then I picked the most important sub-problem and let the model generate a task guidance document around that single problem. The model received a highly abstracted and compressed problem description instead of the raw derivation process accumulated over days. Its context stayed clean and reasoning quality remained normal.

This was especially critical in the later stages of the project. When research reached the deepest point and GLM 4.7 started talking complete nonsense, I opened a brand new conversation window, pasted the current state of the problem tree, picked the most important sub-problem, and let GLM work on the checklist. The reason was simple: It received a clean, abstracted problem instead of context polluted by days of noise.

The core logic of this method is: Large language models have a limited effective working window. The more information you stuff in, the less space it has for actual thinking. What the problem tree does is compress the knowledge state of a complex project into an abstract structure that can be picked up and fed at any time. It does not record what you have done. It records what you still do not know and the relationships between those unknowns.

NotebookLM

Throughout the entire reverse engineering project GLM handled the main execution work: scanning binaries, decompiling, writing code, tracing call stacks. But it had an unavoidable flaw: It follows your train of thought too easily.

You give it a direction and it will push along that direction until it hits a wall. Then it spins in place beside the wall instead of taking a step back to question whether the direction itself is wrong.

NotebookLM played a completely different role in this project.

Task Document - Task Report Collaboration Mode

The core of this workflow is splitting knowledge accumulation and task execution into two independent contexts.

NotebookLM is responsible for the knowledge side. Its document library stores all research achievements, the problem tree, and technical documents. It does not participate in specific execution. It only reads documents and outputs task documents.

GLM is responsible for the execution side. It receives the task document, focuses on execution, and outputs a task report when finished. Its context contains only the execution details of the current task. It does not need to remember the entire project history.

The specific flow is like this:

When the project gets stuck or one stage ends, I let NotebookLM read all current research achievements in the document library and generate a task document. The structure is fixed: Problem statement (my understanding of where you are stuck), step-by-step topics (breaking the big problem into three or four manageable small goals), task checklist (under each goal listing specific steps with clear actions and purposes).

I give this task document to GLM to execute exactly as is. No personal explanations, no emotions. Just a simple structured document.

After GLM finishes execution it outputs a task report: Research goals, key findings, conclusions, remaining issues, and next step suggestions. Again in a fixed structure.

I upload this task report to NotebookLM's document library. NotebookLM reads the latest report together with all existing documents and generates the next task document.

Repeat the cycle.

On the surface I am only passing documents back and forth between two models. But it solves a very troublesome problem: context pollution.

In normal human-machine conversation mode, your emotions, unprofessional descriptions, rambling nonsense, and wrong assumptions all accumulate in the context. They get preserved through the compression process and continue interfering with the model's reasoning quality. Swear once and that sentence stays. Give a wrong premise and that premise stays. Take a long detour and that detour stays. The longer the context, the more garbage accumulates, and the model's reasoning ability becomes weaker.

The task document mechanism cuts off this pollution path. NotebookLM only reads structured documents, so there are no chat records between you and GLM. GLM only executes the task document and does not need to understand the full history of the project. What passes between them is all formatted, emotion-free, refined information.

This is equivalent to indirectly expanding the entire system's effective context space. Instead of making one model's context window larger, we separate knowledge and execution storage so each side only carries the information it truly needs.

After GLM sees a task document it often shows an "excited" attitude, saying "Oh, so this is how it can be done!" Then GLM can run for another stretch. After a while it cries again. I throw the new problem statement to NotebookLM...

Basically I became a "human pipeline" (SPL, System Prompt Line). I watched them chat enthusiastically but I myself was confused. I did not understand many details. I was only responsible for passing messages between them. I stayed confused the whole time and could only clap from the side: "You experts are amazing!"

However this later had a more automated solution. After using this Skill I only needed to give good prompts and let them talk to each other. I no longer had to copy and paste back and forth.

Letting Two Models Argue

This mechanism also has a variant usage: When there is controversy over a technical issue I would have the two models critique each other's proposals.

The prompt is fixed: "Please systematically investigate using reverse analysis tools and respond to the following viewpoint using constructive criticism."

GLM gives a conclusion and hands it to NotebookLM for critique. NotebookLM gives a correction and hands it back to GLM for critique. After several rounds a properly double-checked analysis framework naturally emerges. When facing human input alone both models tend to be compliant. But when facing another model's conclusion their critical thinking clearly improves. The hardest part of the character rendering pipeline was figured out using this method.

In this mechanism I only did three things: Decide when to switch (when execution gets stuck, switch to NotebookLM), judge whether the conclusions in the task report are trustworthy (models have no eyes so visual judgment must come from me), and maintain the problem tree (decide what the next most important sub-problem is).

Everything else was completed automatically in this machine-to-machine closed loop.

This is the most important methodological discovery in the entire project: Although smart models are important, a working environment that does not make them stupid is equally important.

When the LLM Starts Talking Nonsense, It Means You May Not Have Fed It Enough Information

This is a rule that was repeatedly verified throughout the project. It is also the most counter-intuitive one: When an LLM starts hitting walls everywhere, spinning in circles, and producing increasingly ridiculous conclusions, your first reaction should not be to change the prompt. It should be to ask it: What information are you missing?

First Time: Finding the TRM Manual

Halfway through font reverse engineering GLM started showing typical "information starvation" symptoms. Static analysis repeatedly claimed "found function entry" but the addresses it gave did not exist in the firmware. The derivation chains became longer and longer and the conclusions more and more mystical. Every few rounds it would say "I have reached the limit of static analysis."

I stopped and asked it directly: What do you need right now to continue?

It said it needed the chip's Spec and TRM. Spec is the hardware specification document. TRM is the Technical Reference Manual containing the complete instruction set, register definitions, and memory map. Without these it could only guess, and it could no longer guess.

The chip model is Rockchip RKnano D. I searched and found a Spec, but it only had hardware-level parameters and no instruction set. The TRM was what we really needed but after searching all regular channels I found nothing.

During the search I found a Japanese website that listed two URLs pointing to hardware documentation and instruction set documentation. When I clicked them a 404 page popped up.

The TRM link was dead.

I stared at that 404 URL for a while. Then I placed the previously working Spec URL next to it for comparison. They had the same top-level domain but different second-level domains. One was a Wiki subdomain and one was the main site domain. The website had moved. The content had migrated from Wiki to the main site, but the Japanese website still recorded the old address.

I changed the subdomain in the TRM URL to the main site format and refreshed.

The file downloaded successfully.

This was a problem that could only be solved through human thinking patterns. Almost no model would stare at a 404 URL and think "What is the relationship between the domain structures of these two links." It would only tell you "File does not exist" and suggest searching with different keywords. Discovering URL structure patterns, inferring website migration, and manually modifying the path requires implicit human experience about how websites operate. It is not something models can do.

After getting the TRM I handed it to NotebookLM. It immediately found the hardware mechanisms corresponding to those previously "non-existent" addresses. The analysis clues that had been stuck for a long time started working again.

I also manually backed up this TRM to the Internet Archive and discovered that someone had backed it up several years ago. It just had not been indexed by search engines. I had worked for nothing, but it only wasted a little time.

Second Time: Treating the Device Itself as Documentation

When we advanced to the hardware simulation stage GLM started talking nonsense again when judging certain specific hardware specifications. This time instead of looking for external documentation I thought of another information source: the device itself.

I organized all the player's software and hardware version numbers along with the product specifications listed on the official page and gave them to GLM together.

This information looked ordinary and even seemed like filler. But it provided important data anchors for basic hardware version judgment. Following the conditional logic tree all the way back we found several places that did not match previous reasoning. Chasing back along these contradictions we broke through the core hardware specification judgment problem. In the end we fully captured the complete call stack for text rendering. This directly led to the completion of Flame Ocean's font lookup and replacement tool.

Flashing into the Device!

After research advanced to a certain stage one thing had to be clarified: Does flashing have signature verification?

No matter how beautifully you reverse-engineer the firmware, if the device verifies the firmware signature during writing then all efforts are wasted.

I first had GLM scan the entire firmware and asked if it found any signature verification or security protection mechanisms. It confidently said: No protective measures were found.

But this conclusion was shaky. "Not found" does not equal "does not exist." This is a classic conditional probability issue. Just like in statistics where a non-significant p-value does not prove there is no effect. Absence of evidence is not evidence of absence.

So I changed the requirement: Do not draw conclusions out of thin air. You must use decompiled data for actual evidence. Find the code and see the logic before speaking.

Next I had the model scan carefully and it actually found something. There was a CRC verification function in the firmware. But this function was very strange: It only returned an 8-bit result.

At the same time, at the very end of the firmware there was a segment of data that structurally looked a lot like some kind of verification hash. I had roughly researched it before. Rockchip has its own RKCRC algorithm. The location and format of this data matched well. But after chasing for a long time I could not find any verification logic that read or compared this data.

The conclusion completely contradicted intuition. From the firmware structure it should have verification. From the code logic I could not find anywhere that was actually performing verification.

Static analysis could not give a definite answer so I boldly flashed a slightly modified package into the machine. It really ate my firmware package without any verification at all. There was no interception during the process.

I asked the hardware friends in the group. They told me that for this kind of low-end MP3 device verification is usually done in the flashing software, not on the hardware. Moreover Snowsky's flashing method is quite special. You only need to copy the firmware package to the root directory. The device will automatically detect it on boot and upgrade. No special flashing tool is needed. So...

Some friends also told me that even many car infotainment system firmware upgrades do not perform any integrity checks. If you feed it a firmware download that was interrupted halfway your car will become a giant black brick. That was quite an eye-opener.

The Tool Takes Shape

With all the research materials ready the next step was to write a small tool for resource editing and replacement. The entire tool was built with Svelte. I basically did not write much myself. I let the LLM help with simple component wrapping and the interface came together quickly. To be honest I did not care much about beauty. It was just a practical tool. For fun I named this tool Flame Ocean. As you can see it is the opposite of Snow Sky.

After finishing the tool I posted it on Reddit. Almost the same day someone immediately replaced the boot screen animation and sent a PR. It could automatically turn videos into sequence frame animations (although the quality of that PR was a bit low. I spent a whole day fixing its UX to an acceptable level. But I am still grateful for community participation).

Later various mod cases started appearing. The first few days were simple boot screen modifications. Upload the official firmware to flame-ocean.not.ci, click "Replace Image," download the new bin file, and flash it. A few minutes later your anime wife would be living inside the device.

A few days later more thorough skin mods started appearing. My favorites include this authentic cassette machine theme, the EVA theme, Fallout theme. This macOS imitation theme is also quite good, and the author emphasized the importance of pixel perfect.

The community atmosphere is pretty good too. Later a website appeared specifically to collect modified firmwares. If a newbie asks a question someone will usually answer it soon after. Community-created modding tutorial documents also started appearing.

When LLMs Become Slot Machines

Existing discussions about the mental health risks of large language models focus on two directions: Using LLMs for psychological counseling actually makes mental health worse, and using LLMs to write code produces impostor syndrome. People have already talked about both of these. I do not want to repeat them.

Using this article I want to talk about a problem that few people discuss: continuous reward stimulation.

Throughout the entire reverse engineering project I stayed up until three in the morning many nights. During a group project meeting my classmate heard my voice and asked if I was okay. I said I was fine. But my watch told me I had been in a long-term stressed state. My body and mind were under continuous strain and I had not noticed it myself.

The reason was not that the project was difficult. It was that large language models created an extremely high-density real-time positive feedback loop.

Every time I issued a task I would get a progress report within five minutes. Every report described the current progress as a breakthrough. Solving one sub-problem immediately revealed the next sub-problem waiting. This cycle had no natural stopping point. There was no "done for today" moment. There was only endless "just one more push and we will have it."

For people with ADHD this pattern is neurologically identical to casino slot machines. The design principle of slot machines is variable ratio reinforcement. You do not win every time but the timing of wins is unpredictable. This uncertainty creates much stronger dopamine stimulation than fixed rewards. The collaboration process with large language models perfectly matches this pattern. Most of the time you are advancing without breakthroughs. Occasionally a key discovery suddenly appears, then you immediately fall into the next unknown. You never know whether the next task document will end in a flat conclusion or a major breakthrough. But you desperately want to find out.

One more round. One more push. Maybe this time it will come through.

This affects ordinary people too, but its destructive power on ADHD patients is another story.

One of the core mechanisms of ADHD is problems with executive function. They have difficulty starting tasks, but once they enter a highly interested state they have extreme difficulty disengaging voluntarily. This is not a willpower issue. It is a neurological regulation difficulty. The high-frequency feedback from large language models precisely triggers the second side of this mechanism. It keeps nailing you to the chair with small rewards, leaving your brain in the illusion that "just a bit more and it will be finished."

At the same time I systematically underestimated the difficulty of this project. I do not understand reverse engineering, so I could not accurately assess how difficult it actually is to find the complete mapping formula from Unicode to glyphs. People who do not understand tend to underestimate difficulty. Underestimating difficulty makes you keep feeling "it will be done soon." And every progress report from the large language model constantly reinforces this illusion. I was imagining what success would look like. I was chasing an endpoint that I thought was within reach but was actually still far away.

This is a very insidious kind of suffering. It drains your energy bit by bit. You do not feel like you are suffering. You feel like you are sprinting with all your strength. Only when you are completely exhausted do you realize there is pain, but by then you cannot escape this high-speed real-time feedback environment at all.

The impostor syndrome problem is: The LLM did the work for you and you doubt your own ability. This is a question of self-perception.

The problem I am describing is different. It does not question your ability. It consumes your body. It exploits not your lack of confidence but your desire for progress, your obsession with completion, and the unstoppable reward circuit in your brain. It uses the same underlying mechanism as gambling addiction and phone addiction, only packaged as "I am doing something very meaningful."

As I write this article all the remaining unsolved problems have not yet been cleaned up. I have not found a good way to actively hit the brakes while maintaining efficient collaboration. The best I can do is explain this clearly so that the next person who falls into it at least knows that the feeling of being unable to stop is not just passion. It is also a signal that needs vigilance.

So What

Through this crazy amount of work the reverse engineering is basically complete. I used LLMs to write various small debugging tools. I even wrote a hardware simulator. In the end I basically figured out that mess of character rendering code.

But this process also makes one a bit worried.

Think about it. I am just a frontend guy who only knows how to deobfuscate JavaScript. Yet relying on two LLMs arguing with each other I was able to tear apart the underlying firmware of a device almost completely. What does this mean? It means that in the future the threshold for "script kiddies" has been lowered to almost nothing.

Moreover that fully automated future is already very close.

At the very beginning I was still the person sitting in the middle manually passing notes: Taking task documents from NotebookLM, pasting them to GLM for execution, taking the reports back, and sending them back to NotebookLM. The process was automated but scheduling relied on a human. But the appearance of the NotebookLM Skill eliminated even this step. I only needed to sit in my chair and watch.

In addition DeepSeek recently published a sparse attention mechanism that to a large extent alleviates the core pain point repeatedly mentioned in this article: Models become stupid once the context gets long. If this problem is truly solved the space for "human" participation will be further compressed. The decision of "which model to switch to now" that originally required a human can also start to be automated.

What is even more worth watching out for is another thing: Many very capable large models are open source. This means that as long as you have enough computing power a great many large language model behaviors can run completely on your own machine without going through any third-party services and without any platform auditing. A complete reverse engineering pipeline from firmware scanning and decompilation to vulnerability identification and exploit path derivation can theoretically run 24/7 in anyone's basement targeting any target.

Currently there is still one real threshold on this path: The reasoning ability of small models is still insufficient. Lightweight models like IQuesta Coder that claim to be SoTA cannot even handle basic file editing commands in OpenCode properly when facing slightly complex engineering tasks, let alone independently completing a full reverse analysis chain. Complex projects still require large models. Large models still require computing power. Computing power still requires money. This threshold still exists for now.

But it is getting lower.

My starting point for this project was simply that I hated how ugly the UI of an MP3 player was. Following this thread I ended up in a direction completely outside my professional field. Using two models and one manual I thoroughly explored the character rendering pipeline of an embedded chip from beginning to end. I am not a security researcher. I have no reverse engineering background. I just have patience, know how to ask questions, and understand when to feed what information.

If this were done by someone who truly understands security, or by a fully automated pipeline, or by a system that never sleeps, never has emotional breakdowns, and always has clean context, to what extent could it go?

Perhaps no idiot would want to change their car's boot screen to an anime wife. But if you are driving on the highway and a jump scare suddenly pops up on the car's screen, that would be quite scary.

That fully automated future is not science fiction. It is already on the way.