Skip to content

gitstq/VectorForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Build Status License Version Python

๐Ÿ”ฅ VectorForge

่ฝป้‡็บง้ซ˜ๆ€ง่ƒฝๅ‘้‡็ดขๅผ•ๅผ•ๆ“Ž ยท ็บฏ Python ้›ถไพ่ต–ๅฎž็Žฐ

็ฎ€ไฝ“ไธญๆ–‡ ยท ็น้ซ”ไธญๆ–‡ ยท English


็ฎ€ไฝ“ไธญๆ–‡

pip install git+https://github.com/gitstq/VectorForge.git

๐Ÿ“– ้กน็›ฎไป‹็ป

VectorForge ๆ˜ฏไธ€ไธช่ฝป้‡็บงใ€้ซ˜ๆ€ง่ƒฝ็š„ๅ‘้‡็ดขๅผ•ๅผ•ๆ“Ž๏ผŒๅฎŒๅ…จไฝฟ็”จ็บฏ Python ๅฎž็Žฐ๏ผŒ้›ถๅค–้ƒจไพ่ต–๏ผˆNumPy ไธบๅฏ้€‰ๅŠ ้€Ÿไพ่ต–๏ผ‰ใ€‚ๅฎƒไธบๅ‘้‡็›ธไผผๅบฆๆœ็ดขๆไพ›ไบ†ไธ€ๅฅ—ๅฎŒๆ•ด็š„ๅทฅๅ…ท้“พ๏ผŒๆถต็›–ๅคš็ง็ดขๅผ•็ฎ—ๆณ•ใ€่ท็ฆปๅบฆ้‡ใ€ๆŒไน…ๅŒ–ๅญ˜ๅ‚จใ€ๅ‘ฝไปค่กŒๅทฅๅ…ทๅ’Œๅฏ่ง†ๅŒ–ไปช่กจ็›˜ใ€‚

ๆ— ่ฎบไฝ ๆ˜ฏๅœจๆž„ๅปบ่ฏญไน‰ๆœ็ดขๅผ•ๆ“Žใ€ๆŽจ่็ณป็ปŸใ€ๅ›พๅƒๆฃ€็ดข๏ผŒ่ฟ˜ๆ˜ฏๅšๆœบๅ™จๅญฆไน ๅŽŸๅž‹้ชŒ่ฏ๏ผŒVectorForge ้ƒฝ่ƒฝ่ฎฉไฝ ไปฅๆœ€็ฎ€ๅ•็š„ๆ–นๅผๅฟซ้€ŸไธŠๆ‰‹ๅ‘้‡ๆœ็ดขใ€‚

โœจ ๆ ธๅฟƒ็‰นๆ€ง

็‰นๆ€ง ๆ่ฟฐ
๐Ÿง  ๅ››็ง็ดขๅผ•็ฎ—ๆณ• HNSW๏ผˆๅˆ†ๅฑ‚ๅฏผ่ˆชๅฐไธ–็•Œๅ›พ๏ผ‰ใ€IVF-Flat๏ผˆๅ€’ๆŽ’ๆ–‡ไปถ๏ผ‰ใ€LSH๏ผˆๅฑ€้ƒจๆ•ๆ„Ÿๅ“ˆๅธŒ๏ผ‰ใ€Brute-Force๏ผˆๆšดๅŠ›ๆœ็ดข๏ผ‰
๐Ÿ“ ไธ‰็ง่ท็ฆปๅบฆ้‡ ไฝ™ๅผฆ็›ธไผผๅบฆ๏ผˆCosine๏ผ‰ใ€ๆฌงๆฐ่ท็ฆป๏ผˆEuclidean๏ผ‰ใ€ๅ†…็งฏ๏ผˆInner Product๏ผ‰
๐Ÿ’พ ็ดขๅผ•ๆŒไน…ๅŒ– ๆ”ฏๆŒ JSON ๆ ผๅผๅบๅˆ—ๅŒ–/ๅๅบๅˆ—ๅŒ–๏ผŒ็ดขๅผ•ๅฏไฟๅญ˜ๅˆฐ็ฃ็›˜้šๆ—ถๅŠ ่ฝฝ
๐Ÿ“‚ ๅคšๆ ผๅผ IO ๆ”ฏๆŒ JSON ๅ’Œ CSV ๆ ผๅผ็š„ๅ‘้‡ๆ•ฐๆฎๅฏผๅ…ฅๅฏผๅ‡บ
๐Ÿ“Š TUI ็ปˆ็ซฏไปช่กจ็›˜ ๅœจ็ปˆ็ซฏไธญๅฏ่ง†ๅŒ–ๆŸฅ็œ‹็ดขๅผ•็Šถๆ€ใ€็ปŸ่ฎกไฟกๆฏๅ’Œๆ€ง่ƒฝๆŒ‡ๆ ‡
โšก ๆ€ง่ƒฝๅŸบๅ‡†ๆต‹่ฏ• ๅ†…็ฝฎ benchmark ๆจกๅ—๏ผŒไธ€้”ฎๅฏนๆฏ”ไธๅŒ็ดขๅผ•็ฎ—ๆณ•็š„ๆž„ๅปบไธŽๆŸฅ่ฏขๆ€ง่ƒฝ
๐Ÿ–ฅ๏ธ CLI ๅ‘ฝไปค่กŒๅทฅๅ…ท ๅฎŒๆ•ด็š„ๅ‘ฝไปค่กŒๆŽฅๅฃ๏ผŒๆ”ฏๆŒๆž„ๅปบใ€ๆœ็ดขใ€่ฝฌๆขใ€็”Ÿๆˆใ€ๅŸบๅ‡†ๆต‹่ฏ•็ญ‰ๆ“ไฝœ
๐Ÿ ้ซ˜็บงๆœ็ดข API ็ฎ€ๆดไผ˜้›…็š„ Python API๏ผŒๅ‡ ่กŒไปฃ็ ๅณๅฏๅฎŒๆˆๅ‘้‡็ดขๅผ•ไธŽๆฃ€็ดข

๐Ÿš€ ๅฟซ้€Ÿๅผ€ๅง‹

ๅฎ‰่ฃ…

# ้€š่ฟ‡ pip ไปŽ GitHub ๅฎ‰่ฃ…
pip install git+https://github.com/gitstq/VectorForge.git

# ๆˆ–่€…็›ดๆŽฅๅ…‹้š†ๆบ็ ไฝฟ็”จ
git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

ไธ€ๅˆ†้’ŸไธŠๆ‰‹

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# 1. ๅ‡†ๅค‡ๅ‘้‡ๆ•ฐๆฎ
vectors = [Vector(id=i, values=[float(j) for j in range(128)]) for i in range(100)]

# 2. ๅˆ›ๅปบๆœ็ดขๅผ•ๆ“Ž๏ผˆ้ซ˜็บง API๏ผ‰
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))
engine.index(vectors)

# 3. ๆœ็ดขๆœ€่ฟ‘้‚ป
query = Vector(id=-1, values=[float(j) for j in range(128)])
results = engine.search(query, k=10)

for r in results:
    print(f"ID: {r.id}, ่ท็ฆป: {r.distance:.4f}")

๐Ÿ“š ่ฏฆ็ป†ไฝฟ็”จๆŒ‡ๅ—

CLI ๅ‘ฝไปค่กŒๅทฅๅ…ท

VectorForge ๆไพ›ไบ†ๅŠŸ่ƒฝๅฎŒๆ•ด็š„ๅ‘ฝไปค่กŒๅทฅๅ…ท vectorforge๏ผŒ่ฆ†็›–ไปŽๆž„ๅปบๅˆฐๆœ็ดข็š„ๅฎŒๆ•ดๅทฅไฝœๆต๏ผš

# ๐Ÿ—๏ธ ๆž„ๅปบ็ดขๅผ•
vectorforge build -t hnsw -d 128 -n 1000 -o my_index.json

# ๐Ÿ” ๆœ็ดขๆœ€่ฟ‘้‚ป
vectorforge search my_index.json -q "0.1,0.2,0.3,..." -k 10

# โšก ่ฟ่กŒๆ€ง่ƒฝๅŸบๅ‡†ๆต‹่ฏ•
vectorforge benchmark -n 1000 -d 64

# ๐Ÿ“Š ๆ‰“ๅผ€็ปˆ็ซฏไปช่กจ็›˜
vectorforge dashboard my_index.json

# ๐ŸŽฒ ็”Ÿๆˆ้šๆœบๅ‘้‡ๆ•ฐๆฎ
vectorforge generate -n 500 -d 128 -o vectors.json

# ๐Ÿ”„ ๆ ผๅผ่ฝฌๆข๏ผˆCSV โ†’ JSON๏ผ‰
vectorforge convert -i vectors.csv -o vectors.json

Python API

้ซ˜็บง API๏ผˆๆŽจ่๏ผ‰

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# ๅˆ›ๅปบๆœ็ดขๅผ•ๆ“Ž
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))

# ๆ‰น้‡็ดขๅผ•ๅ‘้‡
engine.index(vectors)

# ๆœ็ดข k ไธชๆœ€่ฟ‘้‚ป
results = engine.search(query_vector, k=10)

ๅบ•ๅฑ‚ API๏ผˆ็ฒพ็ป†ๆŽงๅˆถ๏ผ‰

from vectorforge.core import Vector, IndexConfig
from vectorforge.index import HNSWIndex

# ๅˆ›ๅปบ HNSW ็ดขๅผ•
index = HNSWIndex(IndexConfig(dimension=128, metric="cosine"))

# ๆž„ๅปบ็ดขๅผ•
index.build(vectors)

# ๆœ็ดข
results = index.search(query, k=10)

# ๆŒไน…ๅŒ–ๅˆฐ็ฃ็›˜
index.save("my_index.json")

# ไปŽ็ฃ็›˜ๅŠ ่ฝฝ
index.load("my_index.json")

ๆ”ฏๆŒ็š„็ดขๅผ•็ฎ—ๆณ•

็ฎ—ๆณ• ้€‚็”จๅœบๆ™ฏ ็‰น็‚น
HNSW ๅคง่ง„ๆจก้ซ˜็ฒพๅบฆๆœ็ดข ๅˆ†ๅฑ‚ๅ›พ็ป“ๆž„๏ผŒๅฌๅ›ž็އ้ซ˜๏ผŒๆŸฅ่ฏข้€Ÿๅบฆๅฟซ
IVF-Flat ไธญๅคง่ง„ๆจกๅนณ่กกๆœ็ดข ๅ€’ๆŽ’็ดขๅผ• + ็ฒพ็กฎ้‡ๆŽ’๏ผŒ้€ŸๅบฆไธŽ็ฒพๅบฆๅนณ่กก
LSH ่ถ…ๅคง่ง„ๆจก่ฟ‘ไผผๆœ็ดข ๅ“ˆๅธŒๆ˜ ๅฐ„๏ผŒไบš็บฟๆ€งๆŸฅ่ฏขๆ—ถ้—ด
Brute-Force ๅฐ่ง„ๆจก็ฒพ็กฎๆœ็ดข ้€ไธ€ๆฏ”ๅฏน๏ผŒ100% ๅฌๅ›ž็އ

ๆ”ฏๆŒ็š„่ท็ฆปๅบฆ้‡

from vectorforge.core import IndexConfig

# ไฝ™ๅผฆ็›ธไผผๅบฆ๏ผˆ้ป˜่ฎค๏ผŒ้€‚ๅˆๆ–‡ๆœฌ/่ฏญไน‰ๆœ็ดข๏ผ‰
config_cosine = IndexConfig(dimension=128, metric="cosine")

# ๆฌงๆฐ่ท็ฆป๏ผˆ้€‚ๅˆๅ›พๅƒ็‰นๅพๆœ็ดข๏ผ‰
config_euclidean = IndexConfig(dimension=128, metric="euclidean")

# ๅ†…็งฏ๏ผˆ้€‚ๅˆๅทฒๅฝ’ไธ€ๅŒ–็š„ๅ‘้‡๏ผ‰
config_inner = IndexConfig(dimension=128, metric="inner_product")

ๅ‘้‡ๆ•ฐๆฎ IO

from vectorforge.io import export_json, import_json, export_csv, import_csv

# ๅฏผๅ‡บไธบ JSON
export_json(vectors, "vectors.json")

# ไปŽ JSON ๅฏผๅ…ฅ
vectors = import_json("vectors.json")

# ๅฏผๅ‡บไธบ CSV
export_csv(vectors, "vectors.csv")

# ไปŽ CSV ๅฏผๅ…ฅ
vectors = import_csv("vectors.csv")

๐Ÿ—๏ธ ่ฎพ่ฎกๆ€่ทฏไธŽ่ฟญไปฃ่ง„ๅˆ’

่ฎพ่ฎกๅ“ฒๅญฆ

  • ้›ถไพ่ต–ไผ˜ๅ…ˆ๏ผšๆ ธๅฟƒๅŠŸ่ƒฝ็บฏ Python ๅฎž็Žฐ๏ผŒไธไพ่ต–ไปปไฝ•็ฌฌไธ‰ๆ–นๅบ“๏ผŒ้™ไฝŽ้ƒจ็ฝฒๅคๆ‚ๅบฆ
  • API ๅˆ†ๅฑ‚่ฎพ่ฎก๏ผš้ซ˜็บง API ็ฎ€ๆดๆ˜“็”จ๏ผŒๅบ•ๅฑ‚ API ็ตๆดปๅฏๆŽง๏ผŒๆปก่ถณไธๅŒๅœบๆ™ฏ้œ€ๆฑ‚
  • ๅฏๆ‰ฉๅฑ•ๆžถๆž„๏ผšๅŸบไบŽๆŠฝ่ฑกๅŸบ็ฑป็š„็ดขๅผ•ๆŽฅๅฃ๏ผŒ่ฝปๆพๆทปๅŠ ๆ–ฐ็š„็ดขๅผ•็ฎ—ๆณ•ๅ’Œ่ท็ฆปๅบฆ้‡
  • ๅทฅๅ…ท้“พๅฎŒๆ•ด๏ผšไปŽๆ•ฐๆฎ็”Ÿๆˆใ€ๆ ผๅผ่ฝฌๆขใ€็ดขๅผ•ๆž„ๅปบใ€ๆœ็ดขๆŸฅ่ฏขๅˆฐๆ€ง่ƒฝๆต‹่ฏ•๏ผŒไธ€ๆก้พ™่ฆ†็›–

่ฟญไปฃ่ง„ๅˆ’

  • v1.0.0 โœ… โ€” ๆ ธๅฟƒๅผ•ๆ“Žๅ‘ๅธƒ๏ผšๅ››็ง็ดขๅผ•็ฎ—ๆณ•ใ€ไธ‰็ง่ท็ฆปๅบฆ้‡ใ€CLI ๅทฅๅ…ทใ€TUI ไปช่กจ็›˜
  • v1.1.0 ๐Ÿ”ฎ โ€” ๆ–ฐๅขž PQ๏ผˆไน˜็งฏ้‡ๅŒ–๏ผ‰็ฎ—ๆณ•ใ€ๆ”ฏๆŒ mmap ๅ†…ๅญ˜ๆ˜ ๅฐ„ใ€ๆๅ‡ๅคง่ง„ๆจกๆ•ฐๆฎๆ€ง่ƒฝ
  • v1.2.0 ๐Ÿ”ฎ โ€” ๅผ•ๅ…ฅ่ฟ‡ๆปคๆœ็ดข๏ผˆFilter Search๏ผ‰ใ€ๆ”ฏๆŒๅ…ƒๆ•ฐๆฎๅ…ณ่”ๆŸฅ่ฏข
  • v2.0.0 ๐Ÿ”ฎ โ€” ๅˆ†ๅธƒๅผ็ดขๅผ•ๆ”ฏๆŒใ€gRPC ๆœๅŠก็ซฏๆจกๅผใ€Web ็ฎก็†็•Œ้ข

๐Ÿ“ฆ ๆ‰“ๅŒ…ไธŽ้ƒจ็ฝฒ

ไปŽๆบ็ ๅฎ‰่ฃ…

git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

ไฝœไธบๅบ“ไฝฟ็”จ

# ๅœจ requirements.txt ไธญๆทปๅŠ 
vectorforge @ git+https://github.com/gitstq/VectorForge.git

็ณป็ปŸ่ฆๆฑ‚

  • Python 3.8 ๆˆ–ๆ›ด้ซ˜็‰ˆๆœฌ
  • ๆ— ้œ€ไปปไฝ•ๅค–้ƒจไพ่ต–๏ผˆNumPy ไธบๅฏ้€‰๏ผŒๅฎ‰่ฃ…ๅŽๅฏ่‡ชๅŠจๅŠ ้€Ÿ่ฎก็ฎ—๏ผ‰

๐Ÿค ่ดก็ŒฎๆŒ‡ๅ—

ๆˆ‘ไปฌๆฌข่ฟŽๅนถๆ„Ÿ่ฐขๆ‰€ๆœ‰ๅฝขๅผ็š„่ดก็Œฎ๏ผๆ— ่ฎบๆ˜ฏๆไบค Bugใ€ๆ”น่ฟ›ๆ–‡ๆกฃ๏ผŒ่ฟ˜ๆ˜ฏ่ดก็Œฎไปฃ็ ใ€‚

  1. ๐Ÿด Fork ๆœฌไป“ๅบ“
  2. ๐ŸŒฟ ๅˆ›ๅปบ็‰นๆ€งๅˆ†ๆ”ฏ๏ผšgit checkout -b feature/my-new-feature
  3. ๐Ÿ’พ ๆไบคไฝ ็š„ๆ”นๅŠจ๏ผšgit commit -m 'Add some feature'
  4. ๐Ÿš€ ๆŽจ้€ๅˆฐ่ฟœ็จ‹ๅˆ†ๆ”ฏ๏ผšgit push origin feature/my-new-feature
  5. ๐Ÿ“ ๆไบค Pull Request

่ฏท็กฎไฟๆ‰€ๆœ‰ๆไบค้€š่ฟ‡ CI ๆต‹่ฏ•๏ผŒๅนถ้ตๅพช้กน็›ฎ็š„ไปฃ็ ่ง„่Œƒใ€‚

๐Ÿ“„ ๅผ€ๆบๅ่ฎฎ

ๆœฌ้กน็›ฎๅŸบไบŽ MIT License ๅผ€ๆบใ€‚

MIT License

Copyright (c) 2024 VectorForge Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

็น้ซ”ไธญๆ–‡

pip install git+https://github.com/gitstq/VectorForge.git

๐Ÿ“– ๅฐˆๆกˆไป‹็ดน

VectorForge ๆ˜ฏไธ€ๅ€‹่ผ•้‡็ดšใ€้ซ˜ๆ•ˆ่ƒฝ็š„ๅ‘้‡็ดขๅผ•ๅผ•ๆ“Ž๏ผŒๅฎŒๅ…จไฝฟ็”จ็ด” Python ๅฏฆ็พ๏ผŒ้›ถๅค–้ƒจไพ่ณด๏ผˆNumPy ็‚บๅฏ้ธๅŠ ้€Ÿไพ่ณด๏ผ‰ใ€‚ๅฎƒ็‚บๅ‘้‡็›ธไผผๅบฆๆœๅฐ‹ๆไพ›ไบ†ไธ€ๅฅ—ๅฎŒๆ•ด็š„ๅทฅๅ…ท้ˆ๏ผŒๆถต่“‹ๅคš็จฎ็ดขๅผ•ๆผ”็ฎ—ๆณ•ใ€่ท้›ขๅบฆ้‡ใ€ๆŒไน…ๅŒ–ๅ„ฒๅญ˜ใ€ๅ‘ฝไปคๅˆ—ๅทฅๅ…ทๅ’Œ่ฆ–่ฆบๅŒ–ๅ„€่กจๆฟใ€‚

็„ก่ซ–ไฝ ๆ˜ฏๅœจๅปบๆง‹่ชžๆ„ๆœๅฐ‹ๅผ•ๆ“Žใ€ๆŽจ่–ฆ็ณป็ตฑใ€ๅœ–ๅƒๆชข็ดข๏ผŒ้‚„ๆ˜ฏๅšๆฉŸๅ™จๅญธ็ฟ’ๅŽŸๅž‹้ฉ—่ญ‰๏ผŒVectorForge ้ƒฝ่ƒฝ่ฎ“ไฝ ไปฅๆœ€็ฐกๅ–ฎ็š„ๆ–นๅผๅฟซ้€ŸไธŠๆ‰‹ๅ‘้‡ๆœๅฐ‹ใ€‚

โœจ ๆ ธๅฟƒ็‰นๆ€ง

็‰นๆ€ง ๆ่ฟฐ
๐Ÿง  ๅ››็จฎ็ดขๅผ•ๆผ”็ฎ—ๆณ• HNSW๏ผˆๅˆ†ๅฑคๅฐŽ่ˆชๅฐไธ–็•Œๅœ–๏ผ‰ใ€IVF-Flat๏ผˆๅ€’ๆŽ’ๆช”ๆกˆ๏ผ‰ใ€LSH๏ผˆๅฑ€้ƒจๆ•ๆ„Ÿ้›œๆนŠ๏ผ‰ใ€Brute-Force๏ผˆๆšดๅŠ›ๆœๅฐ‹๏ผ‰
๐Ÿ“ ไธ‰็จฎ่ท้›ขๅบฆ้‡ ้ค˜ๅผฆ็›ธไผผๅบฆ๏ผˆCosine๏ผ‰ใ€ๆญๆฐ่ท้›ข๏ผˆEuclidean๏ผ‰ใ€ๅ…ง็ฉ๏ผˆInner Product๏ผ‰
๐Ÿ’พ ็ดขๅผ•ๆŒไน…ๅŒ– ๆ”ฏๆด JSON ๆ ผๅผๅบๅˆ—ๅŒ–/ๅๅบๅˆ—ๅŒ–๏ผŒ็ดขๅผ•ๅฏๅ„ฒๅญ˜ๅˆฐ็ฃ็ขŸ้šจๆ™‚่ผ‰ๅ…ฅ
๐Ÿ“‚ ๅคšๆ ผๅผ IO ๆ”ฏๆด JSON ๅ’Œ CSV ๆ ผๅผ็š„ๅ‘้‡่ณ‡ๆ–™ๅŒฏๅ…ฅๅŒฏๅ‡บ
๐Ÿ“Š TUI ็ต‚็ซฏๅ„€่กจๆฟ ๅœจ็ต‚็ซฏไธญ่ฆ–่ฆบๅŒ–ๆŸฅ็œ‹็ดขๅผ•็‹€ๆ…‹ใ€็ตฑ่จˆ่ณ‡่จŠๅ’Œๆ•ˆ่ƒฝๆŒ‡ๆจ™
โšก ๆ•ˆ่ƒฝๅŸบๆบ–ๆธฌ่ฉฆ ๅ…งๅปบ benchmark ๆจก็ต„๏ผŒไธ€้ตๅฐๆฏ”ไธๅŒ็ดขๅผ•ๆผ”็ฎ—ๆณ•็š„ๅปบๆง‹่ˆ‡ๆŸฅ่ฉขๆ•ˆ่ƒฝ
๐Ÿ–ฅ๏ธ CLI ๅ‘ฝไปคๅˆ—ๅทฅๅ…ท ๅฎŒๆ•ด็š„ๅ‘ฝไปคๅˆ—ไป‹้ข๏ผŒๆ”ฏๆดๅปบๆง‹ใ€ๆœๅฐ‹ใ€่ฝ‰ๆ›ใ€็”Ÿๆˆใ€ๅŸบๆบ–ๆธฌ่ฉฆ็ญ‰ๆ“ไฝœ
๐Ÿ ้ซ˜้šŽๆœๅฐ‹ API ็ฐกๆฝ”ๅ„ช้›…็š„ Python API๏ผŒๅนพ่กŒ็จ‹ๅผ็ขผๅณๅฏๅฎŒๆˆๅ‘้‡็ดขๅผ•่ˆ‡ๆชข็ดข

๐Ÿš€ ๅฟซ้€Ÿ้–‹ๅง‹

ๅฎ‰่ฃ

# ้€้Ž pip ๅพž GitHub ๅฎ‰่ฃ
pip install git+https://github.com/gitstq/VectorForge.git

# ๆˆ–่€…็›ดๆŽฅๅ…‹้š†ๅŽŸๅง‹็ขผไฝฟ็”จ
git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

ไธ€ๅˆ†้˜ไธŠๆ‰‹

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# 1. ๆบ–ๅ‚™ๅ‘้‡่ณ‡ๆ–™
vectors = [Vector(id=i, values=[float(j) for j in range(128)]) for i in range(100)]

# 2. ๅปบ็ซ‹ๆœๅฐ‹ๅผ•ๆ“Ž๏ผˆ้ซ˜้šŽ API๏ผ‰
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))
engine.index(vectors)

# 3. ๆœๅฐ‹ๆœ€่ฟ‘้„ฐ
query = Vector(id=-1, values=[float(j) for j in range(128)])
results = engine.search(query, k=10)

for r in results:
    print(f"ID: {r.id}, ่ท้›ข: {r.distance:.4f}")

๐Ÿ“š ่ฉณ็ดฐไฝฟ็”จๆŒ‡ๅ—

CLI ๅ‘ฝไปคๅˆ—ๅทฅๅ…ท

VectorForge ๆไพ›ไบ†ๅŠŸ่ƒฝๅฎŒๆ•ด็š„ๅ‘ฝไปคๅˆ—ๅทฅๅ…ท vectorforge๏ผŒ่ฆ†่“‹ๅพžๅปบๆง‹ๅˆฐๆœๅฐ‹็š„ๅฎŒๆ•ดๅทฅไฝœๆต็จ‹๏ผš

# ๐Ÿ—๏ธ ๅปบๆง‹็ดขๅผ•
vectorforge build -t hnsw -d 128 -n 1000 -o my_index.json

# ๐Ÿ” ๆœๅฐ‹ๆœ€่ฟ‘้„ฐ
vectorforge search my_index.json -q "0.1,0.2,0.3,..." -k 10

# โšก ๅŸท่กŒๆ•ˆ่ƒฝๅŸบๆบ–ๆธฌ่ฉฆ
vectorforge benchmark -n 1000 -d 64

# ๐Ÿ“Š ้–‹ๅ•Ÿ็ต‚็ซฏๅ„€่กจๆฟ
vectorforge dashboard my_index.json

# ๐ŸŽฒ ็”Ÿๆˆ้šจๆฉŸๅ‘้‡่ณ‡ๆ–™
vectorforge generate -n 500 -d 128 -o vectors.json

# ๐Ÿ”„ ๆ ผๅผ่ฝ‰ๆ›๏ผˆCSV โ†’ JSON๏ผ‰
vectorforge convert -i vectors.csv -o vectors.json

Python API

้ซ˜้šŽ API๏ผˆๆŽจ่–ฆ๏ผ‰

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# ๅปบ็ซ‹ๆœๅฐ‹ๅผ•ๆ“Ž
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))

# ๆ‰น้‡็ดขๅผ•ๅ‘้‡
engine.index(vectors)

# ๆœๅฐ‹ k ๅ€‹ๆœ€่ฟ‘้„ฐ
results = engine.search(query_vector, k=10)

ๅบ•ๅฑค API๏ผˆ็ฒพ็ดฐๆŽงๅˆถ๏ผ‰

from vectorforge.core import Vector, IndexConfig
from vectorforge.index import HNSWIndex

# ๅปบ็ซ‹ HNSW ็ดขๅผ•
index = HNSWIndex(IndexConfig(dimension=128, metric="cosine"))

# ๅปบๆง‹็ดขๅผ•
index.build(vectors)

# ๆœๅฐ‹
results = index.search(query, k=10)

# ๆŒไน…ๅŒ–ๅˆฐ็ฃ็ขŸ
index.save("my_index.json")

# ๅพž็ฃ็ขŸ่ผ‰ๅ…ฅ
index.load("my_index.json")

ๆ”ฏๆด็š„็ดขๅผ•ๆผ”็ฎ—ๆณ•

ๆผ”็ฎ—ๆณ• ้ฉ็”จๅ ดๆ™ฏ ็‰น้ปž
HNSW ๅคง่ฆๆจก้ซ˜็ฒพๅบฆๆœๅฐ‹ ๅˆ†ๅฑคๅœ–็ตๆง‹๏ผŒๅฌๅ›ž็އ้ซ˜๏ผŒๆŸฅ่ฉข้€Ÿๅบฆๅฟซ
IVF-Flat ไธญๅคง่ฆๆจกๅนณ่กกๆœๅฐ‹ ๅ€’ๆŽ’็ดขๅผ• + ็ฒพ็ขบ้‡ๆŽ’๏ผŒ้€Ÿๅบฆ่ˆ‡็ฒพๅบฆๅนณ่กก
LSH ่ถ…ๅคง่ฆๆจก่ฟ‘ไผผๆœๅฐ‹ ้›œๆนŠๆ˜ ๅฐ„๏ผŒไบž็ทšๆ€งๆŸฅ่ฉขๆ™‚้–“
Brute-Force ๅฐ่ฆๆจก็ฒพ็ขบๆœๅฐ‹ ้€ไธ€ๆฏ”ๅฐ๏ผŒ100% ๅฌๅ›ž็އ

ๆ”ฏๆด็š„่ท้›ขๅบฆ้‡

from vectorforge.core import IndexConfig

# ้ค˜ๅผฆ็›ธไผผๅบฆ๏ผˆ้ ่จญ๏ผŒ้ฉๅˆๆ–‡ๅญ—/่ชžๆ„ๆœๅฐ‹๏ผ‰
config_cosine = IndexConfig(dimension=128, metric="cosine")

# ๆญๆฐ่ท้›ข๏ผˆ้ฉๅˆๅœ–ๅƒ็‰นๅพตๆœๅฐ‹๏ผ‰
config_euclidean = IndexConfig(dimension=128, metric="euclidean")

# ๅ…ง็ฉ๏ผˆ้ฉๅˆๅทฒๆญธไธ€ๅŒ–็š„ๅ‘้‡๏ผ‰
config_inner = IndexConfig(dimension=128, metric="inner_product")

ๅ‘้‡่ณ‡ๆ–™ IO

from vectorforge.io import export_json, import_json, export_csv, import_csv

# ๅŒฏๅ‡บ็‚บ JSON
export_json(vectors, "vectors.json")

# ๅพž JSON ๅŒฏๅ…ฅ
vectors = import_json("vectors.json")

# ๅŒฏๅ‡บ็‚บ CSV
export_csv(vectors, "vectors.csv")

# ๅพž CSV ๅŒฏๅ…ฅ
vectors = import_csv("vectors.csv")

๐Ÿ—๏ธ ่จญ่จˆๆ€่ทฏ่ˆ‡่ฟญไปฃ่ฆๅŠƒ

่จญ่จˆๅ“ฒๅญธ

  • ้›ถไพ่ณดๅ„ชๅ…ˆ๏ผšๆ ธๅฟƒๅŠŸ่ƒฝ็ด” Python ๅฏฆ็พ๏ผŒไธไพ่ณดไปปไฝ•็ฌฌไธ‰ๆ–นๅ‡ฝๅผๅบซ๏ผŒ้™ไฝŽ้ƒจ็ฝฒ่ค‡้›œๅบฆ
  • API ๅˆ†ๅฑค่จญ่จˆ๏ผš้ซ˜้šŽ API ็ฐกๆฝ”ๆ˜“็”จ๏ผŒๅบ•ๅฑค API ้ˆๆดปๅฏๆŽง๏ผŒๆปฟ่ถณไธๅŒๅ ดๆ™ฏ้œ€ๆฑ‚
  • ๅฏๆ“ดๅฑ•ๆžถๆง‹๏ผšๅŸบๆ–ผๆŠฝ่ฑกๅŸบ้กž็š„็ดขๅผ•ไป‹้ข๏ผŒ่ผ•้ฌ†ๆ–ฐๅขžๆ–ฐ็š„็ดขๅผ•ๆผ”็ฎ—ๆณ•ๅ’Œ่ท้›ขๅบฆ้‡
  • ๅทฅๅ…ท้ˆๅฎŒๆ•ด๏ผšๅพž่ณ‡ๆ–™็”Ÿๆˆใ€ๆ ผๅผ่ฝ‰ๆ›ใ€็ดขๅผ•ๅปบๆง‹ใ€ๆœๅฐ‹ๆŸฅ่ฉขๅˆฐๆ•ˆ่ƒฝๆธฌ่ฉฆ๏ผŒไธ€ๆข้พ่ฆ†่“‹

่ฟญไปฃ่ฆๅŠƒ

  • v1.0.0 โœ… โ€” ๆ ธๅฟƒๅผ•ๆ“Ž็™ผๅธƒ๏ผšๅ››็จฎ็ดขๅผ•ๆผ”็ฎ—ๆณ•ใ€ไธ‰็จฎ่ท้›ขๅบฆ้‡ใ€CLI ๅทฅๅ…ทใ€TUI ๅ„€่กจๆฟ
  • v1.1.0 ๐Ÿ”ฎ โ€” ๆ–ฐๅขž PQ๏ผˆไน˜็ฉ้‡ๅŒ–๏ผ‰ๆผ”็ฎ—ๆณ•ใ€ๆ”ฏๆด mmap ่จ˜ๆ†ถ้ซ”ๆ˜ ๅฐ„ใ€ๆๅ‡ๅคง่ฆๆจก่ณ‡ๆ–™ๆ•ˆ่ƒฝ
  • v1.2.0 ๐Ÿ”ฎ โ€” ๅผ•ๅ…ฅ้Žๆฟพๆœๅฐ‹๏ผˆFilter Search๏ผ‰ใ€ๆ”ฏๆดๅ…ƒ่ณ‡ๆ–™้—œ่ฏๆŸฅ่ฉข
  • v2.0.0 ๐Ÿ”ฎ โ€” ๅˆ†ๆ•ฃๅผ็ดขๅผ•ๆ”ฏๆดใ€gRPC ไผบๆœ็ซฏๆจกๅผใ€Web ็ฎก็†ไป‹้ข

๐Ÿ“ฆ ๆ‰“ๅŒ…่ˆ‡้ƒจ็ฝฒ

ๅพžๅŽŸๅง‹็ขผๅฎ‰่ฃ

git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

ไฝœ็‚บๅ‡ฝๅผๅบซไฝฟ็”จ

# ๅœจ requirements.txt ไธญๆทปๅŠ 
vectorforge @ git+https://github.com/gitstq/VectorForge.git

็ณป็ตฑ้œ€ๆฑ‚

  • Python 3.8 ๆˆ–ๆ›ด้ซ˜็‰ˆๆœฌ
  • ็„ก้œ€ไปปไฝ•ๅค–้ƒจไพ่ณด๏ผˆNumPy ็‚บๅฏ้ธ๏ผŒๅฎ‰่ฃๅพŒๅฏ่‡ชๅ‹•ๅŠ ้€Ÿ่จˆ็ฎ—๏ผ‰

๐Ÿค ่ฒข็ปๆŒ‡ๅ—

ๆˆ‘ๅ€‘ๆญก่ฟŽไธฆๆ„Ÿ่ฌๆ‰€ๆœ‰ๅฝขๅผ็š„่ฒข็ป๏ผ็„ก่ซ–ๆ˜ฏๆไบค Bugใ€ๆ”น้€ฒๆ–‡ไปถ๏ผŒ้‚„ๆ˜ฏ่ฒข็ป็จ‹ๅผ็ขผใ€‚

  1. ๐Ÿด Fork ๆœฌๅ€‰ๅบซ
  2. ๐ŸŒฟ ๅปบ็ซ‹็‰นๆ€งๅˆ†ๆ”ฏ๏ผšgit checkout -b feature/my-new-feature
  3. ๐Ÿ’พ ๆไบคไฝ ็š„ๆ”นๅ‹•๏ผšgit commit -m 'Add some feature'
  4. ๐Ÿš€ ๆŽจ้€ๅˆฐ้ ็ซฏๅˆ†ๆ”ฏ๏ผšgit push origin feature/my-new-feature
  5. ๐Ÿ“ ๆไบค Pull Request

่ซ‹็ขบไฟๆ‰€ๆœ‰ๆไบค้€š้Ž CI ๆธฌ่ฉฆ๏ผŒไธฆ้ตๅพชๅฐˆๆกˆ็š„็จ‹ๅผ็ขผ่ฆ็ฏ„ใ€‚

๐Ÿ“„ ้–‹ๆบๅ”่ญฐ

ๆœฌๅฐˆๆกˆๅŸบๆ–ผ MIT License ้–‹ๆบใ€‚

MIT License

Copyright (c) 2024 VectorForge Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

English

pip install git+https://github.com/gitstq/VectorForge.git

๐Ÿ“– About

VectorForge is a lightweight, high-performance vector index engine built entirely in pure Python with zero external dependencies (NumPy is optional for acceleration). It provides a complete toolkit for vector similarity search, featuring multiple index algorithms, distance metrics, persistent storage, CLI tools, and a visual dashboard.

Whether you're building semantic search engines, recommendation systems, image retrieval pipelines, or prototyping ML models, VectorForge gets you up and running with vector search in the simplest way possible.

โœจ Key Features

Feature Description
๐Ÿง  Four Index Algorithms HNSW (Hierarchical Navigable Small World), IVF-Flat (Inverted File), LSH (Locality-Sensitive Hashing), Brute-Force
๐Ÿ“ Three Distance Metrics Cosine Similarity, Euclidean Distance, Inner Product
๐Ÿ’พ Index Persistence Serialize/deserialize indexes to/from disk in JSON format
๐Ÿ“‚ Multi-Format I/O Import and export vector data in JSON and CSV formats
๐Ÿ“Š TUI Dashboard Visualize index status, statistics, and performance metrics right in your terminal
โšก Benchmarking Module Built-in benchmark tool to compare build and query performance across index algorithms
๐Ÿ–ฅ๏ธ CLI Tool Full-featured command-line interface for building, searching, converting, generating, and benchmarking
๐Ÿ Advanced Search API Clean and elegant Python API โ€” index and search vectors in just a few lines of code

๐Ÿš€ Quick Start

Installation

# Install via pip from GitHub
pip install git+https://github.com/gitstq/VectorForge.git

# Or clone the source and install locally
git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

One-Minute Tutorial

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# 1. Prepare vector data
vectors = [Vector(id=i, values=[float(j) for j in range(128)]) for i in range(100)]

# 2. Create a search engine (high-level API)
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))
engine.index(vectors)

# 3. Search for nearest neighbors
query = Vector(id=-1, values=[float(j) for j in range(128)])
results = engine.search(query, k=10)

for r in results:
    print(f"ID: {r.id}, Distance: {r.distance:.4f}")

๐Ÿ“š Detailed Usage Guide

CLI Tool

VectorForge ships with a full-featured CLI tool vectorforge that covers the entire workflow from index building to querying:

# ๐Ÿ—๏ธ Build an index
vectorforge build -t hnsw -d 128 -n 1000 -o my_index.json

# ๐Ÿ” Search for nearest neighbors
vectorforge search my_index.json -q "0.1,0.2,0.3,..." -k 10

# โšก Run performance benchmarks
vectorforge benchmark -n 1000 -d 64

# ๐Ÿ“Š Open the terminal dashboard
vectorforge dashboard my_index.json

# ๐ŸŽฒ Generate random vector data
vectorforge generate -n 500 -d 128 -o vectors.json

# ๐Ÿ”„ Convert formats (CSV โ†’ JSON)
vectorforge convert -i vectors.csv -o vectors.json

Python API

High-Level API (Recommended)

from vectorforge.core import Vector, IndexConfig
from vectorforge.search import VectorSearch

# Create a search engine
engine = VectorSearch(index_type="hnsw", config=IndexConfig(dimension=128))

# Index vectors in bulk
engine.index(vectors)

# Search for k nearest neighbors
results = engine.search(query_vector, k=10)

Low-Level API (Fine-Grained Control)

from vectorforge.core import Vector, IndexConfig
from vectorforge.index import HNSWIndex

# Create an HNSW index
index = HNSWIndex(IndexConfig(dimension=128, metric="cosine"))

# Build the index
index.build(vectors)

# Search
results = index.search(query, k=10)

# Persist to disk
index.save("my_index.json")

# Load from disk
index.load("my_index.json")

Supported Index Algorithms

Algorithm Best For Characteristics
HNSW Large-scale, high-accuracy search Hierarchical graph structure, high recall, fast queries
IVF-Flat Medium-to-large scale, balanced search Inverted index + exact re-ranking, balanced speed and accuracy
LSH Ultra-large scale, approximate search Hash-based mapping, sub-linear query time
Brute-Force Small-scale, exact search Exhaustive comparison, 100% recall guarantee

Supported Distance Metrics

from vectorforge.core import IndexConfig

# Cosine similarity (default, ideal for text/semantic search)
config_cosine = IndexConfig(dimension=128, metric="cosine")

# Euclidean distance (ideal for image feature search)
config_euclidean = IndexConfig(dimension=128, metric="euclidean")

# Inner product (ideal for pre-normalized vectors)
config_inner = IndexConfig(dimension=128, metric="inner_product")

Vector Data I/O

from vectorforge.io import export_json, import_json, export_csv, import_csv

# Export to JSON
export_json(vectors, "vectors.json")

# Import from JSON
vectors = import_json("vectors.json")

# Export to CSV
export_csv(vectors, "vectors.csv")

# Import from CSV
vectors = import_csv("vectors.csv")

๐Ÿ—๏ธ Design Philosophy & Roadmap

Design Philosophy

  • Zero Dependencies First: Core functionality is implemented in pure Python with no third-party libraries, minimizing deployment complexity
  • Layered API Design: The high-level API is simple and intuitive, while the low-level API offers fine-grained control for advanced use cases
  • Extensible Architecture: Abstract base classes for index interfaces make it easy to add new algorithms and distance metrics
  • Complete Toolchain: From data generation and format conversion to index building, querying, and benchmarking โ€” everything is covered end-to-end

Roadmap

  • v1.0.0 โœ… โ€” Core engine release: four index algorithms, three distance metrics, CLI tool, TUI dashboard
  • v1.1.0 ๐Ÿ”ฎ โ€” Add PQ (Product Quantization) algorithm, mmap memory mapping support, improved large-scale data performance
  • v1.2.0 ๐Ÿ”ฎ โ€” Introduce filtered search, metadata-associated queries
  • v2.0.0 ๐Ÿ”ฎ โ€” Distributed index support, gRPC server mode, web management UI

๐Ÿ“ฆ Packaging & Deployment

Install from Source

git clone https://github.com/gitstq/VectorForge.git
cd VectorForge
pip install -e .

Use as a Library

# Add to requirements.txt
vectorforge @ git+https://github.com/gitstq/VectorForge.git

System Requirements

  • Python 3.8 or later
  • No external dependencies required (NumPy is optional and enables automatic compute acceleration when installed)

๐Ÿค Contributing

We welcome and appreciate contributions of all kinds โ€” bug reports, documentation improvements, or code contributions.

  1. ๐Ÿด Fork this repository
  2. ๐ŸŒฟ Create a feature branch: git checkout -b feature/my-new-feature
  3. ๐Ÿ’พ Commit your changes: git commit -m 'Add some feature'
  4. ๐Ÿš€ Push to the remote branch: git push origin feature/my-new-feature
  5. ๐Ÿ“ Submit a Pull Request

Please make sure all submissions pass CI tests and follow the project's code conventions.

๐Ÿ“„ License

This project is licensed under the MIT License.

MIT License

Copyright (c) 2024 VectorForge Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

Made with ๐Ÿ”ฅ by VectorForge Contributors

About

๐Ÿ”ฅ VectorForge - Lightweight High-Performance Vector Index Engine | Pure Python, Zero Dependencies, HNSW/IVF/LSH/BruteForce, TUI Dashboard, Multi-Format I/O, Cross-Platform

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages