This notebook guides you step by step on using Tair as a vector database for OpenAI embeddings.
This notebook presents an end-to-end process of:
- Using precomputed embeddings created by OpenAI API.
- Storing the embeddings in a cloud instance of Tair.
- Converting raw text query to an embedding with OpenAI API.
- Using Tair to perform the nearest neighbour search in the created collection.
Tair is a cloud native in-memory database service that is developed by Alibaba Cloud. Tair is compatible with open source Redis and provides a variety of data models and enterprise-class capabilities to support your real-time online scenarios. Tair also introduces persistent memory-optimized instances that are based on the new non-volatile memory (NVM) storage medium. These instances can reduce costs by 30%, ensure data persistence, and provide almost the same performance as in-memory databases. Tair has been widely used in areas such as government affairs, finance, manufacturing, healthcare, and pan-Internet to meet their high-speed query and computing requirements.
Tairvector is an in-house data structure that provides high-performance real-time storage and retrieval of vectors. TairVector provides two indexing algorithms: Hierarchical Navigable Small World (HNSW) and Flat Search. Additionally, TairVector supports multiple distance functions, such as Euclidean distance, inner product, and Jaccard distance. Compared with traditional vector retrieval services, TairVector has the following advantages:
- Stores all data in memory and supports real-time index updates to reduce latency of read and write operations.
- Uses an optimized data structure in memory to better utilize storage capacity.
- Functions as an out-of-the-box data structure in a simple and efficient architecture without complex modules or dependencies.