/ ? ! $
2026-02-09 Signals
W56 Linear/sparse attention for efficient local LLM inference

Kimi-Linear-48B-A3B uses linear attention in a 48B model with only 3B active params now available as GGUF; TEAM accelerates MoE diffusion LLMs via temporal-spatial expert activation; OneVision-Encoder proposes codec-aligned sparsity for multimodal models.

convergence
15/35 implementation
25/30 engagement
6/15 significance
10/20

Kimi-Linear-48B-A3B-Instruct GGUF release shows linear attention models reaching local deployment — next bottleneck is quantization-aware kernel support in llama.cpp for non-softmax attention variants.

3 sources
W56 Local model enthusiasm for small efficient models

Step-3.5-Flash praised as strong for its size (140 upvotes); separate post (515 upvotes, 231 comments) discusses negative outlook for local LLM community, suggesting tension between cloud and local deployment economics.

convergence
10/35 implementation
25/30 engagement
15/15 significance
6/20
2 sources
W52 Efficient SAM variants for real-time video segmentation

Efficient-SAM2 accelerates SAM2 with object-aware visual encoding and memory retrieval for real-time video; SAM3 node update adds text-prompt detection and background removal in ComfyUI workflows.

convergence
15/35 implementation
25/30 engagement
4/15 significance
8/20
2 sources
W47 Adversarial attacks on LLM-based agents via prompt injection

Data exfiltration from messaging app agents via URL previews demonstrated; MUZZLE proposes agentic red-teaming of web agents against indirect prompt injection; StealthRL uses RL to evade multiple AI-text detectors simultaneously.

convergence
15/35 implementation
20/30 engagement
1/15 significance
11/20

Prompt injection attacks now demonstrated against deployed agent products (OpenClaw example) — next bottleneck is that defenses require input sanitization at the tool-call boundary, which no major agent framework standardizes yet.

3 sources
W44 Qwen model family local deployment and fine-tuning

Qwen3-Coder-Next praised as best general-purpose model at its size (530 upvotes), Qwen3.5 support merged in llama.cpp, abliterated GGUF variant published with 4865 downloads, and Qwen-Image-Edit LoRA trained for image style transfer.

convergence
0/35 implementation
20/30 engagement
15/15 significance
9/20

Qwen3.5 llama.cpp merge and abliterated GGUFs already shipping — next bottleneck is whether Qwen3.5 quantized variants maintain quality parity with full-precision on reasoning benchmarks.

4 sources
W43 Embodied chain-of-thought for robot manipulation policies

Self-supervised bootstrapping replaces rigid CoT templates in VLA models; dexterous manipulation policies learned from RGB human videos via 3D hand-object trajectory reconstruction; χ₀ addresses distributional inconsistencies as the primary bottleneck in long-horizon robotic manipulation.

convergence
15/35 implementation
20/30 engagement
0/15 significance
8/20

χ₀ identifies distributional inconsistency (not data scale) as the primary bottleneck for reliable long-horizon manipulation — next step is whether self-supervised CoT bootstrapping can close sim-to-real transfer gaps without domain-specific templates.

3 sources
W43 Autoregressive world models for robot control debate

Reddit discussion (38 upvotes, 43 comments) questions whether autoregressive video world models are the right foundation for robot control; Dreaming in Code uses foundation models to programmatically generate curriculum environments for open-ended learning.

convergence
15/35 implementation
20/30 engagement
1/15 significance
7/20
2 sources
2026-02-09 Tracking
W36 LLM agents for automated code reproducibility

Paper compares prompt-based vs agent-based approaches for automating computational reproducibility in social science; Agentseed generates AGENTS.md files from codebases to help AI coding agents understand repos.

convergence
15/35 implementation
15/30 engagement
0/15 significance
6/20
2 sources
W22 Structured context management for large schema LLM tasks

Paper explores structured context engineering for SQL schemas up to 10,000 tables across models; separate discussion identifies offline/async LLM workloads (eval pipelines, dataset labeling) as highest-volume use cases rather than latency-sensitive ones.

convergence
15/35 implementation
0/30 engagement
0/15 significance
7/20
2 sources
W15 Process reward models for visual chain-of-thought reasoning

Three papers independently address visual reasoning with structured intermediate steps: process reward models for thinking-with-images, annotation-free hierarchical synthetic CoT for VLMs, and adaptive test-time scaling with world models for spatial reasoning.

convergence
7/35 implementation
0/30 engagement
0/15 significance
8/20

CoTZero eliminates annotation dependency for visual CoT and process reward models now evaluate intermediate visual reasoning steps — next bottleneck is scaling test-time compute adaptively without fixed step budgets.

3 sources
FAQ
What is HiddenState?

A daily briefing that scrapes 8 source types across the ML ecosystem, filters out the noise, and clusters what remains by technical mechanism — not topic.

Most ML news is recycled press releases. HiddenState watches for convergence: when multiple independent sources start working on the same bottleneck, something real is happening. Everything else is noise.

The top 10 mechanisms are ranked by W-index and split into Signals (strongest evidence) and Tracking (early signals worth watching) at the largest natural score gap.

What is W-index?

A 0–100 score measuring signal strength. Higher = more evidence that something real is happening.

ComponentMaxWhat it measures
Convergence35How many independent sources report this. Single source = 0 — unless it links to working code, which counts as a second data point.
Implementation30Evidence of working code. GitHub repo = 30. HuggingFace model = 20. Paper only = 0.
Engagement15Upvotes, stars, points. Capped low so hype can't inflate the score.
Significance20Clustering model's assessment of technical importance.

W60+ strong — W25-59 moderate — W<25 early/weak

Code beats vaporware. A shipped GitHub project with 3 sources will always outscore a hyped paper with 500 Reddit upvotes but no implementation.

Who are our sources?
SourceWhat we pull
arxivPreprints from cs.LG, cs.CL, cs.AI, cs.CV, stat.ML — the raw research firehose
Redditr/MachineLearning, r/LocalLLaMA, r/StableDiffusion, r/MLOps — practitioner signal
GitHubTrending ML repos with 50+ stars — implementation evidence
Hacker NewsML-related posts with 15+ points — cross-domain attention
HuggingFaceTrending models + watched quantizers (bartowski, MaziyarPanahi, LoneStriker)
OpenReviewTMLR + NeurIPS workshops — peer-reviewed & bleeding-edge
Twitter9 curated accounts (akhaliq, karpathy, srush, fchollet, etc.)
Papers w/ CodeTrending papers with implementations — community-vetted research
RSS BlogsLilian Weng, Chip Huyen, Eugene Yan, Simon Willison, Interconnects, Latent Space, Netflix Tech + PyTorch & HF blogs

Items that appear across multiple sources score higher. Single-source items start at zero convergence.

Signals vs Tracking — what's the difference?

Both sections show real signals. Up to 10 mechanisms are sorted by W-index and split at the largest natural score gap — Signals are above the gap, Tracking below. The split point changes daily based on the data; tied scores always land on the same side.

Tracking does not mean bad, unimportant, or wrong. It usually means a signal has fewer independent sources so far, or lacks public code — things that can change overnight. Some of the most consequential developments start in Tracking before the rest of the ecosystem catches up.

Likewise, a high W-index does not mean research is good, correct, or worth adopting. W-index measures visibility and convergence across sources, not quality. A flawed paper that gets widely discussed will score higher than a brilliant one nobody has noticed yet.

HiddenState is a detection tool, not an endorsement. It tells you where activity is clustering — what you do with that is up to you. Nothing here should be read as a recommendation, ranking of merit, or judgement on any researcher's work.

What does noise rejection mean?

Of all items collected, only 10 make it to the final briefing. The rejection rate is the percentage that got cut.

Filtering happens in three stages:

StageWhat gets cut
Pre-filterShort abstracts, low-engagement posts, duplicates across sources
ClusteringItems that don't converge on a shared mechanism with other items
RankingClusters below the top 10 by W-index

A 99% rejection rate means 99 out of 100 items were noise. That's the point — most ML news doesn't matter on any given day.

Privacy
Data collection

None. HiddenState collects no personal data, no email addresses, no IP logs, no usage analytics, and no telemetry of any kind.

Cookies & tracking

Zero cookies. No first-party, no third-party, no session cookies, no tracking pixels.

The only client-side storage is localStorage for your theme preference (dark/light). This never leaves your browser and contains no identifying information.

External requests

Pages load zero external scripts, fonts, stylesheets, or analytics. Everything is self-contained. The only outbound link is to Ko-fi if you choose to click it.

Data sources

HiddenState monitors 9 distinct public data streams (ArXiv, GitHub, Reddit, etc.) to detect cross-platform convergence. We do not use private user data; we only analyze what the community has already published.