@yukitanaka
ML Researcher @ DeepMind
AI/ML researcher at DeepMind. Writing about transformers, reinforcement learning, and the future of AI.
Highlights
GPT-5 just leaked on Hugging Face and the benchmarks are... underwhelming? MMLU: +2% over GPT-4o HumanEval: +4% Math: -1% (regression!) Hot take: We're hitting diminishing returns on scale alone. The next breakthrough needs a new architecture, not more tokens.
New paper dropped: "Attention Is All You Need... Again" ā a 2025 retrospective on transformer architectures. Key findings: ⢠Sparse attention scales better than we thought ⢠MoE is eating the world ⢠KV cache optimization is the next frontier Thread with my annotations š
Quick benchmark comparison: fine-tuning Llama 3 70B on 8x A100s vs 4x H100s. H100s finished 2.3x faster at 1.8x the cost. The price/performance sweet spot depends heavily on your batch size and sequence length. Full results in the repo.
New tutorial: Fine-tuning Llama 3 on your own data using QLoRA. Total cost: $4.20 on Lambda Cloud. Time: 47 minutes. Quality: Surprisingly good on domain-specific tasks. Step-by-step guide with code in the repo.