HiddenState — 2026-02-13

2026-02-13 Signals

W62 MiniMax M2.5 open-weight MoE release and benchmarking ▸

MiniMax released M2.5 open weights on HuggingFace, held an AMA, and appeared on SWE-rebench January 2026 leaderboard alongside GLM-5, Opus 4.6, and Qwen3-Coder-Next.

convergence

10/35 implementation

25/30 engagement

15/15 significance

12/20

MiniMax M2.5 appeared on SWE-rebench alongside top proprietary models — next bottleneck is quantized inference support and community tooling for its MoE architecture.

4 sources

reddit MiniMaxAI/MiniMax-M2.5 · Hugging Face 390pts
reddit MiniMax-M2.5 Checkpoints on huggingface will be in 8 hours 182pts
reddit AMA with MiniMax — Ask Us Anything! 237pts
reddit SWE-rebench Jan 2026: GLM-5, MiniMax M2.5,... 277pts

W60 Sparse MoE models running locally via quantization ▸

GPT-OSS 120B (128 experts, top-4 routing, ~5.1B active params) released in native MXFP4, while GPT-OSS 20B runs 100% in-browser via WebGPU with ONNX Runtime and Transformers.js v4.

convergence

10/35 implementation

25/30 engagement

15/15 significance

10/20

GPT-OSS 20B runs in-browser via WebGPU and 120B ships in MXFP4 — next bottleneck is WebGPU memory limits preventing larger MoE active parameter counts in browser.

2 sources

reddit GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF) 342pts
reddit GPT-OSS (20B) running 100% locally in your browser on WebGPU 141pts

W58 Low-cost training of small multimodal models from scratch ▸

Two independent 5B-parameter multimodal models (Dhi-5B trained for $1200, DeepGen 1.0) released, both emphasizing compute-optimal training at small scale.

convergence

10/35 implementation

25/30 engagement

15/15 significance

8/20

Dhi-5B trained from scratch for $1200 at 5B params — next bottleneck is evaluation rigor, as neither model has third-party benchmark verification.

2 sources

reddit UG student launches Dhi-5B (Trained from Scratch) 272pts
reddit DeepGen 1.0: A 5B parameter "Lightweight" unified... 231pts

2026-02-13 Tracking

W27 Open-weight models closing gap with proprietary frontier ▸

Community discussion and SWE-rebench results show GLM-5 and other open-weight models approaching Claude Opus 4.6 on coding benchmarks, with the gap described as the smallest ever.

convergence

0/35 implementation

0/30 engagement

15/15 significance

12/20

SWE-rebench Jan 2026 shows open-weight models competitive with proprietary on coding tasks — next bottleneck is whether this holds on harder agentic benchmarks beyond single-PR resolution.

2 sources

reddit The gap between open-weight and proprietary model... 660pts
reddit SWE-rebench Jan 2026: GLM-5, MiniMax M2.5,... 277pts

W27 Prompt injection in academic peer review ▸

ICML reviewer reports every paper in their batch contains hidden prompt-injection text in the PDF, targeting LLM-based reviewers despite Policy A prohibiting LLM use.

convergence

0/35 implementation

0/30 engagement

15/15 significance

12/20

Prompt injection found in every paper in an ICML review batch — next step is whether conferences adopt PDF sanitization or automated detection before reviewer assignment.

2 sources

reddit [D] ICML: every paper in my review batch contains... 401pts
reddit [D] Has anyone received their ICML papers to review yet? 12pts

W22 Flux.2 Klein for image editing and restoration workflows ▸

Multiple community workflows use Flux.2 Klein (4B and 9B variants) for all-in-one image editing (inpaint, replace, remove), historical photo restoration, game screenshot remastering, and LoRA fine-tuning for UV maps.

convergence

0/35 implementation

0/30 engagement

15/15 significance

7/20

Flux.2 Klein 9B is becoming the default community backbone for image editing workflows — next bottleneck is LoRA training data requirements (38 images reported for UV maps) limiting domain-specific quality.

4 sources

reddit Flux.2 Klein / Ultimate AIO Pro (t2i, i2i, Inpaint,... 51pts
reddit DOA is back (!) so I used Klein 9b to remaster it 323pts
reddit I restored a few historical figures, using Flux.2 Klein 9B. 649pts
reddit Flux 2 Klein 4b trained on LoRa for UV maps 79pts

W19 KV-cache sparsification for inference cost reduction ▸

Nvidia's Dynamic Memory Sparsification (DMS) retrofits existing LLMs to cut reasoning costs by 8x by dynamically pruning the KV cache during inference without accuracy loss.

convergence

0/35 implementation

0/30 engagement

8/15 significance

11/20

1 sources

reddit Nvidia’s new technique cuts LLM reasoning costs by 8x... 217pts

W14 Video inpainting for lip sync and compositing ▸

LTX-2 inpaint tested for lip sync, and SCAIL+VACE+SVI combined for consistent high-quality video shot compositing in diffusion pipelines.

convergence

0/35 implementation

0/30 engagement

8/15 significance

6/20

2 sources

reddit LTX-2 Inpaint test for lip sync 176pts
reddit Combining SCAIL, VACE & SVI for consistent, very... 45pts

W12 Nonexistent token effects in CLIP embedding space ▸

A 2.5-year study on 'undictionary' words — nonexistent tokens that produce consistent effects in CLIP-based diffusion models — published with systematic analysis.

convergence

0/35 implementation

0/30 engagement

5/15 significance

7/20

1 sources

reddit My humble study on the effects of prompting nonexistent... 130pts

W10 Higher compute effort degrading LLM accuracy ▸

Evaluation of 22 model configurations on 169 web research tasks shows higher effort/thinking settings reduce deep research accuracy for GPT-5 and Gemini Flash 3.

convergence

0/35 implementation

0/30 engagement

0/15 significance

10/20

1 sources

reddit [R] Higher effort settings reduce deep research accuracy... 10pts

Component	Max	What it measures
Convergence	35	How many independent sources report this. Single source = 0 — unless it links to working code, which counts as a second data point.
Implementation	30	Evidence of working code. GitHub repo = 30. HuggingFace model = 20. Paper only = 0.
Engagement	15	Upvotes, stars, points. Capped low so hype can't inflate the score.
Significance	20	Clustering model's assessment of technical importance.

Source	What we pull
arxiv	Preprints from cs.LG, cs.CL, cs.AI, cs.CV, stat.ML — the raw research firehose
Reddit	r/MachineLearning, r/LocalLLaMA, r/StableDiffusion, r/MLOps — practitioner signal
GitHub	Trending ML repos with 50+ stars — implementation evidence
Hacker News	ML-related posts with 15+ points — cross-domain attention
HuggingFace	Trending models + watched quantizers (bartowski, MaziyarPanahi, LoneStriker)
OpenReview	TMLR + NeurIPS workshops — peer-reviewed & bleeding-edge
Twitter	9 curated accounts (akhaliq, karpathy, srush, fchollet, etc.)
Papers w/ Code	Trending papers with implementations — community-vetted research
RSS Blogs	Lilian Weng, Chip Huyen, Eugene Yan, Simon Willison, Interconnects, Latent Space, Netflix Tech + PyTorch & HF blogs

Stage	What gets cut
Pre-filter	Short abstracts, low-engagement posts, duplicates across sources
Clustering	Items that don't converge on a shared mechanism with other items
Ranking	Clusters below the top 10 by W-index