The Three Systems Every ML Engineer Should Understand



One of the biggest mindset shifts in my ML career was realizing this:
Most ML models don’t operate in isolation — they live in systems.
If you want to build impactful, scalable machine learning products, there are three systems you need to understand deeply.
1. Retrieval Systems
These systems answer the question: What are the candidate items?
In search, recommendations, support — retrieval is about narrowing the universe.
- Often rule-based, embedding-based, or hybrid
- Must be fast and scalable
- Acts as the “recall” stage of your pipeline
📘 Think: ANN search with FAISS, BM25, or vector databases
2. Ranking Systems
Once you have candidates, you need to sort them.
That’s where ranking systems come in — often based on ML models trained on relevance, engagement, or user feedback.
- Features are critical here
- You’ll use metrics like NDCG, MAP, precision@k
- May involve pairwise or listwise loss functions
💡 This is where most ML energy gets spent — but it’s just one part of the stack.
3. Re-Ranking Systems
These are lightweight, fast models that refine results right before the user sees them.
- Often rule-based, interpretable, and latency-conscious
- May inject business rules, safety checks, or personalization tweaks
- Critical for real-time environments (chatbots, voice, UI suggestions)
⚙️ A good re-ranker is the bridge between engineering constraints and ML intent.
Why These Matter
In real-world ML, systems beat models.
Understanding how retrieval, ranking, and re-ranking work together helps you:
- Ask better product questions
- Design scalable infra
- Prioritize latency vs accuracy trade-offs
And most importantly: ship ML that works.
Let me know which system you want me to go deep on next — I’m planning future posts that unpack each of them.