HiddenState — 2026-02-09

2026-02-09 Signals

W56 Linear/sparse attention for efficient local LLM inference ▸

Kimi-Linear-48B-A3B uses linear attention in a 48B model with only 3B active params now available as GGUF; TEAM accelerates MoE diffusion LLMs via temporal-spatial expert activation; OneVision-Encoder proposes codec-aligned sparsity for multimodal models.

convergence

15/35 implementation

25/30 engagement

6/15 significance

10/20

Kimi-Linear-48B-A3B-Instruct GGUF release shows linear attention models reaching local deployment — next bottleneck is quantization-aware kernel support in llama.cpp for non-softmax attention variants.

3 sources

reddit Kimi-Linear-48B-A3B-Instruct 152pts
arxiv TEAM: Temporal-Spatial Consistency Guided Expert...
arxiv OneVision-Encoder: Codec-Aligned Sparsity as a...

W56 Local model enthusiasm for small efficient models ▸

Step-3.5-Flash praised as strong for its size (140 upvotes); separate post (515 upvotes, 231 comments) discusses negative outlook for local LLM community, suggesting tension between cloud and local deployment economics.

convergence

10/35 implementation

25/30 engagement

15/15 significance

6/20

2 sources

reddit Step-3.5-Flash IS A BEAST 140pts
reddit Bad news for local bros 515pts

W52 Efficient SAM variants for real-time video segmentation ▸

Efficient-SAM2 accelerates SAM2 with object-aware visual encoding and memory retrieval for real-time video; SAM3 node update adds text-prompt detection and background removal in ComfyUI workflows.

convergence

15/35 implementation

25/30 engagement

4/15 significance

8/20

2 sources

arxiv Efficient-SAM2: Accelerating SAM2 with Object-Aware...
reddit SAM3-nOde uPdate 117pts

W47 Adversarial attacks on LLM-based agents via prompt injection ▸

Data exfiltration from messaging app agents via URL previews demonstrated; MUZZLE proposes agentic red-teaming of web agents against indirect prompt injection; StealthRL uses RL to evade multiple AI-text detectors simultaneously.

convergence

15/35 implementation

20/30 engagement

1/15 significance

11/20

Prompt injection attacks now demonstrated against deployed agent products (OpenClaw example) — next bottleneck is that defenses require input sanitization at the tool-call boundary, which no major agent framework standardizes yet.

3 sources

hn Data exfil from agents in messaging apps 33pts
arxiv MUZZLE: Adaptive Agentic Red-Teaming of Web Agents...
paperswithcode StealthRL: Reinforcement Learning Paraphrase Attacks for...

W44 Qwen model family local deployment and fine-tuning ▸

Qwen3-Coder-Next praised as best general-purpose model at its size (530 upvotes), Qwen3.5 support merged in llama.cpp, abliterated GGUF variant published with 4865 downloads, and Qwen-Image-Edit LoRA trained for image style transfer.

convergence

0/35 implementation

20/30 engagement

15/15 significance

9/20

Qwen3.5 llama.cpp merge and abliterated GGUFs already shipping — next bottleneck is whether Qwen3.5 quantized variants maintain quality parity with full-precision on reasoning benchmarks.

4 sources

reddit Do not Let the "Coder" in Qwen3-Coder-Next Fool You!... 530pts
reddit Qwen3.5 Support Merged in llama.cpp 234pts
huggingface bartowski/huihui-ai_Qwen3-Coder-Next-abliterated-GGUF
reddit Coloring Book Qwen Image Edit LoRA 461pts

W43 Embodied chain-of-thought for robot manipulation policies ▸

Self-supervised bootstrapping replaces rigid CoT templates in VLA models; dexterous manipulation policies learned from RGB human videos via 3D hand-object trajectory reconstruction; χ₀ addresses distributional inconsistencies as the primary bottleneck in long-horizon robotic manipulation.

convergence

15/35 implementation

20/30 engagement

0/15 significance

8/20

χ₀ identifies distributional inconsistency (not data scale) as the primary bottleneck for reliable long-horizon manipulation — next step is whether self-supervised CoT bootstrapping can close sim-to-real transfer gaps without domain-specific templates.

3 sources

arxiv Self-Supervised Bootstrapping of Action-Predictive...
arxiv Dexterous Manipulation Policies from RGB Human Videos...
paperswithcode χ_{0}: Resource-Aware Robust Manipulation via Taming...

W43 Autoregressive world models for robot control debate ▸

Reddit discussion (38 upvotes, 43 comments) questions whether autoregressive video world models are the right foundation for robot control; Dreaming in Code uses foundation models to programmatically generate curriculum environments for open-ended learning.

convergence

15/35 implementation

20/30 engagement

1/15 significance

7/20

2 sources

reddit [D] Are autoregressive video world models actually the... 38pts
paperswithcode Dreaming in Code for Curriculum Learning in Open-Ended Worlds

2026-02-09 Tracking

W36 LLM agents for automated code reproducibility ▸

Paper compares prompt-based vs agent-based approaches for automating computational reproducibility in social science; Agentseed generates AGENTS.md files from codebases to help AI coding agents understand repos.

convergence

15/35 implementation

15/30 engagement

0/15 significance

6/20

2 sources

W22 Structured context management for large schema LLM tasks ▸

Paper explores structured context engineering for SQL schemas up to 10,000 tables across models; separate discussion identifies offline/async LLM workloads (eval pipelines, dataset labeling) as highest-volume use cases rather than latency-sensitive ones.

convergence

15/35 implementation

0/30 engagement

0/15 significance

7/20

2 sources

W15 Process reward models for visual chain-of-thought reasoning ▸

Three papers independently address visual reasoning with structured intermediate steps: process reward models for thinking-with-images, annotation-free hierarchical synthetic CoT for VLMs, and adaptive test-time scaling with world models for spatial reasoning.

convergence

7/35 implementation

0/30 engagement

0/15 significance

8/20

CoTZero eliminates annotation dependency for visual CoT and process reward models now evaluate intermediate visual reasoning steps — next bottleneck is scaling test-time compute adaptively without fixed step budgets.

3 sources

Component	Max	What it measures
Convergence	35	How many independent sources report this. Single source = 0 — unless it links to working code, which counts as a second data point.
Implementation	30	Evidence of working code. GitHub repo = 30. HuggingFace model = 20. Paper only = 0.
Engagement	15	Upvotes, stars, points. Capped low so hype can't inflate the score.
Significance	20	Clustering model's assessment of technical importance.

Source	What we pull
arxiv	Preprints from cs.LG, cs.CL, cs.AI, cs.CV, stat.ML — the raw research firehose
Reddit	r/MachineLearning, r/LocalLLaMA, r/StableDiffusion, r/MLOps — practitioner signal
GitHub	Trending ML repos with 50+ stars — implementation evidence
Hacker News	ML-related posts with 15+ points — cross-domain attention
HuggingFace	Trending models + watched quantizers (bartowski, MaziyarPanahi, LoneStriker)
OpenReview	TMLR + NeurIPS workshops — peer-reviewed & bleeding-edge
Twitter	9 curated accounts (akhaliq, karpathy, srush, fchollet, etc.)
Papers w/ Code	Trending papers with implementations — community-vetted research
RSS Blogs	Lilian Weng, Chip Huyen, Eugene Yan, Simon Willison, Interconnects, Latent Space, Netflix Tech + PyTorch & HF blogs

Stage	What gets cut
Pre-filter	Short abstracts, low-engagement posts, duplicates across sources
Clustering	Items that don't converge on a shared mechanism with other items
Ranking	Clusters below the top 10 by W-index