Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)

Project # 210277

13 Bids	budget 10,000 ILS - 25,000 ILS	bidding ends in 37 days, 20 hrs, 44 mins	bid range 135 ILS - 350 ILS / hour 25,000 ILS - 25,000 ILS / project	average bid 265 ILS / hour 25,000 ILS / project

budget

10,000 ILS - 25,000 ILS

bidding ends in

37 days, 20 hrs, 44 mins

bid range

135 ILS - 350 ILS / hour

25,000 ILS - 25,000 ILS / project

average bid

265 ILS / hour

25,000 ILS / project

Email Report

Posted: 11:11, 21 Dec., 2025

Ends: 11:41, 9 Feb., 2026

Expert Data Architect & AI Engineer: ClickHouse Vector Search at Scale (400M+ Rows)

I am looking for a high-level Data Architect and AI Developer to design and implement a high-performance retrieval and analysis system. The project involves managing a massive dataset of 400 million+ records in a single table, enabling both semantic search and analytical capabilities.

Scope of Work:

DB Infrastructure: Setup and optimization of a ClickHouse cluster. Implementation of efficient schema design for high-volume data.

Vector Search Implementation: Configuring ClickHouse for Vector/Semantic search (using VectorSimilarity indexes). You must ensure sub-second latency for vector queries at the specified scale.

Data Pipeline: Developing a process for generating Embeddings from raw text and ingesting them into the DB.

AI Agent Layer: Creating an intelligent agent that translates natural language queries into:

Semantic searches (Vector-based).

Analytical SQL queries (Text-to-SQL).

Synthesized responses using an LLM (RAG architecture).

Technical Stack:

Database: ClickHouse.

AI Frameworks: LangChain or LlamaIndex.

Languages: Python (for the AI/Embedding layer) or Node.js.

Models: OpenAI / Anthropic or local Embedding models (HuggingFace).

(Requirements / Qualifications)
Proven experience with ClickHouse at scale: You must understand Sharding, ReplicatedMergeTree, and memory management.

Vector DB Expertise: Deep understanding of HNSW or other vector indexing methods.

LLM Integration: Experience in building RAG (Retrieval-Augmented Generation) systems.

Performance Engineering: Ability to optimize queries that scan hundreds of millions of rows.