GLM-5 scored 50 on the Intelligence Index as new open-weights leader with 66K+ downloads, released alongside MiniMax M2.5, both targeting long-horizon agentic engineering; Z.ai publicly stated GPU starvation.
GLM-5 at 66K downloads and Intelligence Index score of 50 sets a new open-weights bar — next bottleneck is GPU supply for inference at scale, as Z.ai publicly acknowledged being GPU-starved.
9 sources
- reddit GLM-5 scores 50 on the Intelligence Index and is the new... 638pts
- reddit GLM-5 Officially Released 773pts
- hn GLM-5: From Vibe Coding to Agentic Engineering 378pts
- hn GLM-5: Targeting complex systems engineering and... 479pts
- reddit GLM 5 Released 613pts
- huggingface zai-org/GLM-5
- reddit GLM 5.0 & MiniMax 2.5 Just Dropped, Are We Entering... 260pts
- reddit MiniMax M2.5 Released 269pts
- reddit Z.ai said they are GPU starved, openly. 1467pts
Nanbeige4.1-3B explores whether a 3B model can reason, align, and act as a general model; MiniCPM-SALA (426 likes, 2569 downloads) targets similar small-model general capability — both push the floor of useful model size.
Nanbeige4.1-3B and MiniCPM-SALA both target general capability at 3B scale — next bottleneck is whether agentic tool-use and multi-step reasoning hold up at this size.
2 sources
- reddit Nanbeige4.1-3B: A Small General Model that Reasons,... 156pts
- huggingface openbmb/MiniCPM-SALA
Multiple users report Flux 2 Klein 9B outperforming Qwen Image for editing consistency and LoRA trainability at 4 inference steps, with successful style LoRAs at rank 32 over 7000 steps on Runpod.
5 sources
- reddit Who else left Qwen Image Edit for Flux 2 Klein 115pts
- reddit I continue to be impressed by Flux.2 Klein 9B's trainability 100pts
- reddit Google Street View 2077 (Klein 9b distilled edit) 136pts
- reddit DC Ancient Futurism Style 1 875pts
- reddit ZImageTurboProgressiveLockedUpscale (Works with Z Image... 87pts
ComfyUI custom node renders pose, depth, normal, and canny batches from FBX/GLB animation files (Mixamo) in an interactive 3D viewport for ControlNet conditioning.
1 sources
- reddit interactive 3D Viewport node to render Pose, Depth,... 227pts
GameDevBench evaluates multimodal coding agents on game development, FeatureBench benchmarks agentic coding for complex feature development, and CodeRLM uses tree-sitter indexing to improve how LLM agents navigate codebases.
Multiple benchmarks now test agents on multi-file feature-level coding rather than single-function tasks — next bottleneck is reliable multi-step planning across large codebases.
MetaphorStar applies visual RL to image metaphor understanding, while Reinforced Curriculum Pre-Alignment uses RL-style curriculum for domain-adaptive VLMs — both use reinforcement signals to improve visual reasoning beyond supervised fine-tuning.
2 sources
Six-month follow-up on the Attempt-to-Persuade Eval shows GPT and Claude improved on harmful persuasion resistance while Gemini regressed.
1 sources
Three papers independently address VLA model brittleness in contact-rich manipulation: RISE adds a compositional world model for self-improvement, ABot-M0 uses action manifold learning across hardware, and MolmoSpaces provides a large-scale ecosystem for navigation/manipulation.
Three concurrent papers attack VLA fragility in dynamic manipulation via world models and action manifolds — next bottleneck is sim-to-real transfer fidelity for contact-rich tasks.
3 sources
MOSS-Audio-Tokenizer scales discrete audio tokenization for future audio foundation models (47 likes), while Voxtral Realtime achieves sub-second latency streaming ASR matching offline quality — both address the bottleneck of integrating audio natively into LLM architectures.
MOSS-Audio-Tokenizer targets scaling tokenizers beyond pretrained codec limitations while Voxtral hits sub-second streaming latency — next bottleneck is joint speech understanding and generation in a single LLM pass.
2 sources
- paperswithcode MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for...
- paperswithcode Voxtral Realtime