Mastering Large Model Development from Scratch: Beyond the AI "Black Box"

Industry Articles 2026-02-09 16:02 207

Stop being a mere AI "API caller." Learn how to build a Large Language Model (LLM) from scratch. This guide covers the 4-step training process, RAG vs. Fine-tuning strategies, and how to master the AI "black box" to regain freedom of choice in the generative AI era.

Introduction: Why We Feel Alienated by AI

When we marvel at the power of AI every day, have you ever felt a little uneasy? It seems that we are always "using" AI, but feel alienated and even afraid of its inner workings.

This article records my journey from an AI "API caller" to "making wheels" with my own hands, step by step, to becoming a technology "controller." My goal is to unveil the mystery of Large Language Models (LLMs) for you and help you find a learning path that truly belongs to you.

1. Original Intention: Do We Really "Understand" AI?

I have been in the Internet industry for over a decade. Since 2015, I have witnessed AI's evolution from a professional tool in specific fields like financial risk control to the ubiquitous Generative AI we see today.

Like many, I have spent hours daily interacting with ChatGPT since its inception:

Learning: Mastering new concepts and techniques.
Programming: Developing small applications based on APIs.
Psychological Support: Chatting to relieve stress.

But deep down, I felt limited. I was just an API Call Engineer. Despite subscribing to top AI products and mastering Prompt Engineering, I felt like I was merely learning to talk to a "black box." I longed to open that box and take a closer look.

"Training a large model of your own from scratch is the most efficient way to improve your understanding of large models." — Yang Zhiping, Kaizhi Academy.

In technology, we often say "don't reinvent the wheel." But in the AI era, I realized that I needed to build a wheel myself to truly understand its structure and performance. I joined a training camp organized by Teacher Yang to start my R&D journey from zero to one.

2. Level 1: Dismantling the "Point-and-Shoot Camera"

The first challenge was to run the entire process—from data processing to model deployment—based on an open-source small parameter (25 million) language model project.

This project is like a "point-and-shoot camera": small but complete. It contains all core components: tokenizer, model architecture, and training scripts. Initially, the screen full of unfamiliar code was intimidating. However, I discovered that LLM training follows a fixed four-step process:

Environment Configuration: Setting up dependent libraries and drivers.
Data Preparation: Cleaning and formatting text.
Execution: Starting the training scripts.
Evaluation & Iteration: Testing and optimizing the model.

The Impact of Computing Power (GPU vs CPU)

I had a personal "moment of enlightenment" regarding Computing Power. When running pre-training on my local M4 chip, the system estimated 4,000 to 15,000 minutes. My memory maxed out, and the fans spun wildly.

Moving the same task to a Cloud GPU reduced the time to just 98 minutes. This taught me that computing power is the electricity of this era. Furthermore, I realized that environment configuration and driver incompatibilities are eternal challenges that must be faced with a calm mind.

A "Stupid" but Effective Learning Method

To understand the internal structure, I used AI to assist my learning:

AI Code Documentation: Asking AI to add detailed Chinese comments file-by-file.
Workflow Diagrams: Generating logic flowcharts from the annotated code.
Deep Learning Concepts: Using the "Meta-Anti-Empty" framework to analyze tokenizers, learning rates, and parameters.

3. Advanced Challenge: From "Point-and-Shoot" to "SLR"

The second level involved modifying and building a "SLR camera"—manually transforming a classic open-source architecture into a newer, more efficient domestic model architecture. This is like transplanting the "heart" (sensor) of a Nikon into a Canon body.

Overcoming Technical Pitfalls

I encountered numerous issues:

Vocabulary Mismatch: A single number error in the config file led to garbled output.
Weight Loading: Comparing layer names and data types line-by-line with official technical reports.

I adopted Test-Driven Development (TDD), performing unit tests for every small modification. This "step-by-step" approach allowed me to understand the data flow. When the modified model finally produced its first coherent sentence, the joy was unparalleled. Elements like Aperture (Learning Rate), Shutter (Training Steps), and Sensitivity (Batch Size) finally became controllable.

4. Value Analysis: Why "Build the Camera" Yourself?

Why learn photography principles when "one-click beauty" apps exist? The answer is to gain the wisdom of choice.

RAG vs. Fine-Tuning

Take RAG (Retrieval-Augmented Generation) as an example. It solves 80% of problems. But for the remaining 20%—such as complex industry "slang" or unique writing styles—RAG hits a bottleneck.

By "making wheels," you see that RAG and Fine-tuning are not mutually exclusive:

Trade-offs: You can judge if a problem is a knowledge retrieval issue (RAG) or a model capability issue (Fine-tuning).
Synergy: You can fine-tune an Embedding model to improve RAG retrieval accuracy for specific industry terminology.

5. New Horizons: Insights into AI Product Strategy

After mastering these technologies, my perspective on AI products changed. For instance, I noticed a popular office software that I believe made a strategic misjudgment.

They have a gold mine of internal data (IM, cloud documents, meetings)—the perfect "film" for exclusive models. Yet, they settled for shallow API calls for edge functions like filling tables.

The IM box should be a Function Calling portal.
Cloud Documents should follow the Cursor model to integrate communication and creation.

Conclusion: Achieving "Freedom of Choice"

My biggest gain isn't just the model I trained, but a calm mentality:

Reduced Anxiety: I understand that core AI technology hasn't fundamentally changed in a decade.
Clear Judgment: I can choose the most cost-effective solution (RAG vs. Fine-tuning) for any business.
Core Leverage: I have the foundation to master any AI-related technical capability.

You don't need to be a top scientist, but mastering the LLM development process helps you create unique value in your professional field. If you are curious about the "black box," start by disassembling a "point-and-shoot camera."

AI is the defining variable of the next 20 years. Don't walk alone—find excellent peers.

AI LLM

Read Previous Post >>

Interface Testing | Is High Automation Coverage Becoming a Strategic Burden?