/ ? ! $
2026-02-13 Signals
W62 MiniMax M2.5 open-weight MoE release and benchmarking

MiniMax released M2.5 open weights on HuggingFace, held an AMA, and appeared on SWE-rebench January 2026 leaderboard alongside GLM-5, Opus 4.6, and Qwen3-Coder-Next.

convergence
10/35 implementation
25/30 engagement
15/15 significance
12/20

MiniMax M2.5 appeared on SWE-rebench alongside top proprietary models — next bottleneck is quantized inference support and community tooling for its MoE architecture.

4 sources
W60 Sparse MoE models running locally via quantization

GPT-OSS 120B (128 experts, top-4 routing, ~5.1B active params) released in native MXFP4, while GPT-OSS 20B runs 100% in-browser via WebGPU with ONNX Runtime and Transformers.js v4.

convergence
10/35 implementation
25/30 engagement
15/15 significance
10/20

GPT-OSS 20B runs in-browser via WebGPU and 120B ships in MXFP4 — next bottleneck is WebGPU memory limits preventing larger MoE active parameter counts in browser.

2 sources
W58 Low-cost training of small multimodal models from scratch

Two independent 5B-parameter multimodal models (Dhi-5B trained for $1200, DeepGen 1.0) released, both emphasizing compute-optimal training at small scale.

convergence
10/35 implementation
25/30 engagement
15/15 significance
8/20

Dhi-5B trained from scratch for $1200 at 5B params — next bottleneck is evaluation rigor, as neither model has third-party benchmark verification.

2 sources
2026-02-13 Tracking
W27 Open-weight models closing gap with proprietary frontier

Community discussion and SWE-rebench results show GLM-5 and other open-weight models approaching Claude Opus 4.6 on coding benchmarks, with the gap described as the smallest ever.

convergence
0/35 implementation
0/30 engagement
15/15 significance
12/20

SWE-rebench Jan 2026 shows open-weight models competitive with proprietary on coding tasks — next bottleneck is whether this holds on harder agentic benchmarks beyond single-PR resolution.

2 sources
W27 Prompt injection in academic peer review

ICML reviewer reports every paper in their batch contains hidden prompt-injection text in the PDF, targeting LLM-based reviewers despite Policy A prohibiting LLM use.

convergence
0/35 implementation
0/30 engagement
15/15 significance
12/20

Prompt injection found in every paper in an ICML review batch — next step is whether conferences adopt PDF sanitization or automated detection before reviewer assignment.

2 sources
W22 Flux.2 Klein for image editing and restoration workflows

Multiple community workflows use Flux.2 Klein (4B and 9B variants) for all-in-one image editing (inpaint, replace, remove), historical photo restoration, game screenshot remastering, and LoRA fine-tuning for UV maps.

convergence
0/35 implementation
0/30 engagement
15/15 significance
7/20

Flux.2 Klein 9B is becoming the default community backbone for image editing workflows — next bottleneck is LoRA training data requirements (38 images reported for UV maps) limiting domain-specific quality.

4 sources
W19 KV-cache sparsification for inference cost reduction

Nvidia's Dynamic Memory Sparsification (DMS) retrofits existing LLMs to cut reasoning costs by 8x by dynamically pruning the KV cache during inference without accuracy loss.

convergence
0/35 implementation
0/30 engagement
8/15 significance
11/20
1 sources
W14 Video inpainting for lip sync and compositing

LTX-2 inpaint tested for lip sync, and SCAIL+VACE+SVI combined for consistent high-quality video shot compositing in diffusion pipelines.

convergence
0/35 implementation
0/30 engagement
8/15 significance
6/20
2 sources
W12 Nonexistent token effects in CLIP embedding space

A 2.5-year study on 'undictionary' words — nonexistent tokens that produce consistent effects in CLIP-based diffusion models — published with systematic analysis.

convergence
0/35 implementation
0/30 engagement
5/15 significance
7/20
1 sources
W10 Higher compute effort degrading LLM accuracy

Evaluation of 22 model configurations on 169 web research tasks shows higher effort/thinking settings reduce deep research accuracy for GPT-5 and Gemini Flash 3.

convergence
0/35 implementation
0/30 engagement
0/15 significance
10/20
1 sources
FAQ
What is HiddenState?

A daily briefing that scrapes 8 source types across the ML ecosystem, filters out the noise, and clusters what remains by technical mechanism — not topic.

Most ML news is recycled press releases. HiddenState watches for convergence: when multiple independent sources start working on the same bottleneck, something real is happening. Everything else is noise.

The top 10 mechanisms are ranked by W-index and split into Signals (strongest evidence) and Tracking (early signals worth watching) at the largest natural score gap.

What is W-index?

A 0–100 score measuring signal strength. Higher = more evidence that something real is happening.

ComponentMaxWhat it measures
Convergence35How many independent sources report this. Single source = 0 — unless it links to working code, which counts as a second data point.
Implementation30Evidence of working code. GitHub repo = 30. HuggingFace model = 20. Paper only = 0.
Engagement15Upvotes, stars, points. Capped low so hype can't inflate the score.
Significance20Clustering model's assessment of technical importance.

W60+ strong — W25-59 moderate — W<25 early/weak

Code beats vaporware. A shipped GitHub project with 3 sources will always outscore a hyped paper with 500 Reddit upvotes but no implementation.

Who are our sources?
SourceWhat we pull
arxivPreprints from cs.LG, cs.CL, cs.AI, cs.CV, stat.ML — the raw research firehose
Redditr/MachineLearning, r/LocalLLaMA, r/StableDiffusion, r/MLOps — practitioner signal
GitHubTrending ML repos with 50+ stars — implementation evidence
Hacker NewsML-related posts with 15+ points — cross-domain attention
HuggingFaceTrending models + watched quantizers (bartowski, MaziyarPanahi, LoneStriker)
OpenReviewTMLR + NeurIPS workshops — peer-reviewed & bleeding-edge
Twitter9 curated accounts (akhaliq, karpathy, srush, fchollet, etc.)
Papers w/ CodeTrending papers with implementations — community-vetted research
RSS BlogsLilian Weng, Chip Huyen, Eugene Yan, Simon Willison, Interconnects, Latent Space, Netflix Tech + PyTorch & HF blogs

Items that appear across multiple sources score higher. Single-source items start at zero convergence.

Signals vs Tracking — what's the difference?

Both sections show real signals. Up to 10 mechanisms are sorted by W-index and split at the largest natural score gap — Signals are above the gap, Tracking below. The split point changes daily based on the data; tied scores always land on the same side.

Tracking does not mean bad, unimportant, or wrong. It usually means a signal has fewer independent sources so far, or lacks public code — things that can change overnight. Some of the most consequential developments start in Tracking before the rest of the ecosystem catches up.

Likewise, a high W-index does not mean research is good, correct, or worth adopting. W-index measures visibility and convergence across sources, not quality. A flawed paper that gets widely discussed will score higher than a brilliant one nobody has noticed yet.

HiddenState is a detection tool, not an endorsement. It tells you where activity is clustering — what you do with that is up to you. Nothing here should be read as a recommendation, ranking of merit, or judgement on any researcher's work.

What does noise rejection mean?

Of all items collected, only 10 make it to the final briefing. The rejection rate is the percentage that got cut.

Filtering happens in three stages:

StageWhat gets cut
Pre-filterShort abstracts, low-engagement posts, duplicates across sources
ClusteringItems that don't converge on a shared mechanism with other items
RankingClusters below the top 10 by W-index

A 99% rejection rate means 99 out of 100 items were noise. That's the point — most ML news doesn't matter on any given day.

Privacy
Data collection

None. HiddenState collects no personal data, no email addresses, no IP logs, no usage analytics, and no telemetry of any kind.

Cookies & tracking

Zero cookies. No first-party, no third-party, no session cookies, no tracking pixels.

The only client-side storage is localStorage for your theme preference (dark/light). This never leaves your browser and contains no identifying information.

External requests

Pages load zero external scripts, fonts, stylesheets, or analytics. Everything is self-contained. The only outbound link is to Ko-fi if you choose to click it.

Data sources

HiddenState monitors 9 distinct public data streams (ArXiv, GitHub, Reddit, etc.) to detect cross-platform convergence. We do not use private user data; we only analyze what the community has already published.