Input Tokenizer in Modern LLMs BPE Encoding Complexity and Optimization Input Embedding in Modern Transformers Architecture Transformer in LLM Activated Params in MoE Models DeepSeek V4 Architecture Tricks Evaluation perplexity