Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)
Project # 210277
Job Statistics
| 13 Bids |
budget
10,000 ILS - 25,000 ILS
|
bidding ends in
37
days,
20
hrs,
44
mins
|
bid range
135
ILS
-
350
ILS
/ hour
25,000
ILS
-
25,000
ILS
/ project
|
average bid
265
ILS
/ hour
25,000
ILS
/ project
|
Job Info And Actions
Posted:
11:11, 21 Dec., 2025
Ends:
11:41, 9 Feb., 2026
Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)
I am looking for a high-level Data Architect and AI Developer to design and implement a high-performance retrieval and analysis system. The project involves managing a massive dataset of 400 million+ records in a single table, enabling both semantic search and analytical capabilities.
Scope of Work:
DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.
Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.
Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.
AI Agent Layer: Creating an intelligent agent that translates natural language queries into:
Semantic searches (Vector-based).
Analytical SQL queries (Text-to-SQL).
Synthesized responses using an LLM (RAG architecture).
Technical Stack:
Database: ClickHouse.
AI Frameworks: LangChain or LlamaIndex.
Languages: Python (for the AI/Embedding layer) or Node.js.
Models: OpenAI / Anthropic or local Embedding models (HuggingFace).
(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.
Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.
LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.
Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.
Scope of Work:
DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.
Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.
Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.
AI Agent Layer: Creating an intelligent agent that translates natural language queries into:
Semantic searches (Vector-based).
Analytical SQL queries (Text-to-SQL).
Synthesized responses using an LLM (RAG architecture).
Technical Stack:
Database: ClickHouse.
AI Frameworks: LangChain or LlamaIndex.
Languages: Python (for the AI/Embedding layer) or Node.js.
Models: OpenAI / Anthropic or local Embedding models (HuggingFace).
(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.
Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.
LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.
Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.
skills required
attachments
Please sign in to access project files
the client
Please
sign in
to contact the client
updates
Please sign in to view project updates
Private Bid
Private Bid
|
3 projects
|
|
Private Bid
Private Bid
|
0 projects
|
|
Private Bid
Private Bid
|
0 projects
|
|
Private Bid
Private Bid
|
16 projects
|
|
Private Bid
Private Bid
|
2 projects
|
|