Transformer Core Transformer in LLM Self-Attention Detail Example Casual Self Attention Multi-Head Attention Position and Context Rotary Positional Embedding - Detail Explanation MoE and Modern Model Notes Activated Params in MoE Models DeepSeek V4 Architecture Tricks Parent MOC Large Language Model - MOC