themoddedcube

Chaithu Talasila themoddedcube

Highlights

turboquant-plus turboquant-plus Public

TurboQuant+: 3-bit KV cache value quantization and group size optimization for long-context LLM inference

Python 1
ChannelQuant ChannelQuant Public

Near-lossless 4× KV-cache compression for GQA models at ~4.1 bits/value — per-channel-key INT4 + static outlier ROM, with a reproducible reference model and paper.

Python
LonghornSilicon/kv-cache-engine LonghornSilicon/kv-cache-engine Public

Hardware KV cache compression engine (SystemVerilog) using TurboQuant+ — keys at 4.25 bpv, values at ~3.0 bpv for 3–5× DRAM bandwidth reduction on LLM inference. Block 2 of the LonghornSilicon acce…

SystemVerilog 1
LonghornSilicon/adaptive-precision-attention LonghornSilicon/adaptive-precision-attention Public

Entropy-guided mixed-precision attention: evolutionary search discovers that entropy is the optimal discriminator for per-block quantization decisions

Python
evoplace evoplace Public

LLM-guided evolutionary VLSI placement: beating DreamPlace with evolved objective functions

Python 1
covenant covenant Public

Contract-driven GPU analytical placer (DREAMPlace fork overlay) with a noise-calibrated evaluation protocol, paired multi-seed CIs, liveness gates, calibration arms.

Python