Pinned Loading
-
turboquant-plus
turboquant-plus PublicTurboQuant+: 3-bit KV cache value quantization and group size optimization for long-context LLM inference
Python 1
-
ChannelQuant
ChannelQuant PublicNear-lossless 4× KV-cache compression for GQA models at ~4.1 bits/value — per-channel-key INT4 + static outlier ROM, with a reproducible reference model and paper.
Python
-
LonghornSilicon/kv-cache-engine
LonghornSilicon/kv-cache-engine PublicHardware KV cache compression engine (SystemVerilog) using TurboQuant+ — keys at 4.25 bpv, values at ~3.0 bpv for 3–5× DRAM bandwidth reduction on LLM inference. Block 2 of the LonghornSilicon acce…
SystemVerilog 1
-
LonghornSilicon/adaptive-precision-attention
LonghornSilicon/adaptive-precision-attention PublicEntropy-guided mixed-precision attention: evolutionary search discovers that entropy is the optimal discriminator for per-block quantization decisions
Python
-
covenant
covenant PublicContract-driven GPU analytical placer (DREAMPlace fork overlay) with a noise-calibrated evaluation protocol, paired multi-seed CIs, liveness gates, calibration arms.
Python
If the problem persists, check the GitHub status page or contact support.