Skip to content

refactor: extract guess/rank/score into _guess.py#43

Open
Wolfvin wants to merge 1 commit into
dhondta:mainfrom
Wolfvin:refactor/extract-guess-module
Open

refactor: extract guess/rank/score into _guess.py#43
Wolfvin wants to merge 1 commit into
dhondta:mainfrom
Wolfvin:refactor/extract-guess-module

Conversation

@Wolfvin

@Wolfvin Wolfvin commented Jun 13, 2026

Copy link
Copy Markdown

Summary

Decomposition refactoring of common.py (1510 lines to 1182 + 343 lines), extracting guess/rank/score logic into a dedicated _guess.py module.

What Changed

  • src/codext/common.py: 1510 → 1182 lines (codec registration + utilities remain)
  • src/codext/_guess.py: NEW 343 lines (guess/rank/score extracted)

Why

  1. Single Responsibility: Guess/rank/score is distinct from codec registration
  2. Decomposition: 1510-line god object split into 2 focused modules
  3. Cohesion: All guessing logic now lives together
  4. Reduced Coupling: One-direction import (guess → common)

Regression Testing Proof

Validated with output-based regression testing (27 clusters across Morse, Braille, DNA, Galactic, Kenny, Affine, Vigenere, Atbash, ROT13, Hexagram, Rick Astley, Bacon, Barbie, Navajo, Baudot, Base32):

  • V1: All 27 cluster fingerprints GREEN
  • V2: Raw output identical to pre-refactor baseline
  • V3: Cross-fingerprint matches saved truth
  • V4: Zero drift (5 consecutive runs)

…ss.py

Decomposition refactoring of the god object __common__.py (1510 lines):

BEFORE:
- __common__.py: 1510 lines — contained codec registration, guess/rank/score,
  utilities, error handling, language detection, and more in a single file

AFTER:
- __common__.py: 1182 lines — codec registration, utilities, error handling
- _guess.py: 343 lines — guess/rank/score logic as a cohesive module

This refactoring follows the single responsibility principle:
- _guess.py owns all guessing/ranking/scoring functionality
- __common__.py owns codec registration and utility functions

Changes:
- Extracted _detect(), _lang(), _load_lang_backend(), _validate()
- Extracted __guess(), __make_encodings_dict(), __rank()
- Extracted _Text class and __score()
- Extracted public guess() and rank() functions
- Maintained all imports and monkey-patching (codecs.guess, codecs.rank)
- No behavioral changes — all 27 regression test clusters pass

Verified with output-based regression testing (Regrets tool):
- V1: All 27 cluster fingerprints GREEN
- V2: Direct output comparison identical to pre-refactor baseline
- V3: Cross-fingerprint verification matches saved truth
- Drift: 5 consecutive runs — all STABLE, zero drift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant