Input
- Tokenization
- Tokenizer in Modern LLMs
- BPE Encoding Complexity and Optimization
- Input Embedding in Modern Transformers
- LLM Hyperparameter
- LLM Precision About
Architecture
- Transformer in LLM
- Activated Params in MoE Models
- DeepSeek V4 Architecture Tricks
- Rotary Positional Embedding - Detail Explanation
- Casual Self Attention
- Multi-Head Attention
- Self-Attention Detail Example