Office Hours — Is traditional machine learning still worth learning and building with, or should you focus entirely on LLMs?

Is traditional machine learning still worth learning and building with, or should you focus entirely on LLMs?

The honest answer is neither/nor. You should learn both, but the balance depends entirely on what problem you’re solving and what guarantees you need.

When Traditional ML Still Wins

LLMs are magic for open-ended tasks: summarization, classification, generation, reasoning over text. But they’re terrible at deterministic aggregation, predictive modeling on structured data, and anything where you need reproducible, explainable decisions.

Consider a real example: you’re building a churn prediction system for a SaaS company. You have 12 months of historical customer data—subscription tier, feature usage, support tickets, payment history. An LLM prompt that says “predict if this customer will churn” will either hallucinate a decision or waste tokens on verbose reasoning you can’t audit. A gradient boosting model (XGBoost, LightGBM) trained on that tabular data will give you feature importance scores, calibrated probabilities, and consistency you can actually defend in a board room.

LLMs also fail hard on math. If you’re building financial calculations, inventory optimization, or any system where the output is arithmetically verifiable, traditional ML or deterministic algorithms are non-negotiable. The Daily Signal just covered this: teams discovering that larger context windows don’t fix RAG for aggregation tasks, and deterministic SQL outperforms AI-generated answers when accuracy matters.

When LLMs Actually Replace Traditional ML

The shift happens when the problem isn’t structured. NLP tasks that used to require custom feature engineering, BERT fine-tuning, and weeks of iteration now work with a GPT-5.4 prompt and good examples. Computer vision tasks that demanded labeled datasets and CNN architectures can now be handled by vision LLMs like Gemini 3.5 Omni for document parsing, chart extraction, or visual QA.

The critical difference: LLMs trade interpretability and determinism for flexibility and speed to production. If your model needs to explain why it made a decision in a regulatory audit, fine-tuned tabular ML is still the play. If you need something working in two weeks that handles 50 different use cases reasonably well, LLMs win.

The Practical Pattern Emerging

Production systems are going hybrid. You’re not choosing between “traditional ML” and “LLMs”—you’re building pipelines that use both.

A customer support system might use:

A gradient boosting model to classify incoming tickets into buckets (deterministic, explainable, fast)
An LLM to generate responses within each bucket (flexible, context-aware, human-like)
A rule-based rejection filter to catch edge cases before they hit the customer

Vision systems parsing PDFs use:

OCR or vision LLMs to extract text from images
Deterministic SQL or structured queries to aggregate extracted data
LLMs only for interpretation or free-form reasoning about the results

The Daily Signal piece on larger context windows not fixing RAG drives this home: throwing more tokens at a problem doesn’t solve architectural issues. You need the right tool for each layer.

What You Actually Need to Learn

If you’re starting today, don’t skip traditional ML fundamentals. Learn:

How to work with tabular data and build features
XGBoost or LightGBM for structured prediction
Statistical thinking and evaluation metrics that matter (not just accuracy)
When determinism matters and when flexibility does

Then layer LLMs on top. You’ll recognize when you’re using an LLM as a hammer hitting every nail, and you’ll have the judgment to reach for SQL or a trained model instead.

The practitioners shipping reliable systems in production aren’t doing pure-LLM or pure-ML. They’re mixing them pragmatically. A data scientist who only knows LLMs will build slow, expensive, unreliable systems. One who only knows traditional ML will miss genuinely useful LLM applications.

Bottom line: Learn both. Use traditional ML for problems with structure, reproducibility requirements, or clear ground truth. Use LLMs for flexibility, speed to production, and tasks requiring reasoning or generation. Most real systems need both, used in the right places.

Question via Hacker News