Julius Lipp

AI & Search

Brief

Timeline

Open Source & Projects

ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

Jun 4, 2025

This two-stage method uses GRPO and fine-grained scoring from token logits to enable small language models (SLMs) to outperform much larger competitors in document reranking.

The Hidden Ceiling: How OCR Quality Limits RAG Performance

May 14, 2025

Blog post demonstrating how OCR errors create a performance ceiling for RAG systems, and how multimodal models can improve retrieval by a good margin.

Baked-in Brilliance: Reranking Meets RL with mxbai-rerank-v2

Mar 13, 2025

Second-generation reranking models using reinforcement learning, supporting 100+ languages with up to 32k token context.

Every Byte Matters: Introducing mxbai-embed-xsmall-v1

Oct 14, 2024

Compact 22.7M parameter embedding model with support for binary quantization and matryoshka embeddings.

Baking in Performance - Dynamic Batching with Batched

Sep 16, 2024

A library that adds dynamic batching to inference systems, grouping requests/tensors to maximize model throughput.

Getting Better with Baguetter - New Retrieval Testing Framework

Aug 23, 2024

Open-source information retrieval testing framework supporting sparse, dense, and hybrid search with unified benchmarking.

BM𝒳: A Freshly Baked Take on BM25

Aug 12, 2024

Enhanced lexical search algorithm improving BM25 with entropy-weighted similarity and weighted query augmentation.

Open Source Gets DE-licious: Mixedbread x deepset German/English Embeddings

Jul 18, 2024

Collaboration with deepset producing high-performance German/English embedding models with support for binary quantization/MRL.

ColBERTus Maximus - Introducing mxbai-colbert-large-v1

Mar 19, 2024

ColBERT model for reranking and retrieval, outperforming other ColBERT and cross-encoder models on BEIR.

Fresh 2D-Matryoshka Embedding Model

Mar 15, 2024

Embedding model supporting Matryoshka for both hidden layers and embeddings, enabling flexible 2D dimensionality reduction.

Open Source Strikes Bread - New Fluffy Embedding Model

Mar 8, 2024

State-of-the-art embedding model (mxbai-embed-large-v1) trained on 700M+ data pairs, outperforming OpenAI's text-embedding-v3.

Boost Your Search With The Crispy Mixedbread Rerank Models

Feb 29, 2024

Family of open-source reranking models (xsmall, base, large).

AdmissionHacks

Aug 15, 2023

AI-powered college finder for admiring students

The Hidden Ceiling: How OCR Quality Limits RAG Performance

May 14, 2025

Blog post demonstrating how OCR errors create a performance ceiling for RAG systems, and how multimodal models can improve retrieval by a good margin.

Baked-in Brilliance: Reranking Meets RL with mxbai-rerank-v2

Mar 13, 2025

Second-generation reranking models using reinforcement learning, supporting 100+ languages with up to 32k token context.

Every Byte Matters: Introducing mxbai-embed-xsmall-v1

Oct 14, 2024

Compact 22.7M parameter embedding model with support for binary quantization and matryoshka embeddings.

Baking in Performance - Dynamic Batching with Batched

Sep 16, 2024

A library that adds dynamic batching to inference systems, grouping requests/tensors to maximize model throughput.

Getting Better with Baguetter - New Retrieval Testing Framework

Aug 23, 2024

Open-source information retrieval testing framework supporting sparse, dense, and hybrid search with unified benchmarking.

BM𝒳: A Freshly Baked Take on BM25

Aug 12, 2024

Enhanced lexical search algorithm improving BM25 with entropy-weighted similarity and weighted query augmentation.

Open Source Gets DE-licious: Mixedbread x deepset German/English Embeddings

Jul 18, 2024

Collaboration with deepset producing high-performance German/English embedding models with support for binary quantization/MRL.

ColBERTus Maximus - Introducing mxbai-colbert-large-v1

Mar 19, 2024

ColBERT model for reranking and retrieval, outperforming other ColBERT and cross-encoder models on BEIR.

Fresh 2D-Matryoshka Embedding Model

Mar 15, 2024

Embedding model supporting Matryoshka for both hidden layers and embeddings, enabling flexible 2D dimensionality reduction.

Open Source Strikes Bread - New Fluffy Embedding Model

Mar 8, 2024

State-of-the-art embedding model (mxbai-embed-large-v1) trained on 700M+ data pairs, outperforming OpenAI's text-embedding-v3.

Boost Your Search With The Crispy Mixedbread Rerank Models

Feb 29, 2024

Family of open-source reranking models (xsmall, base, large).

ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking

Jun 4, 2025

This two-stage method uses GRPO and fine-grained scoring from token logits to enable small language models (SLMs) to outperform much larger competitors in document reranking.

BM𝒳: A Freshly Baked Take on BM25

Aug 12, 2024

Enhanced lexical search algorithm improving BM25 with entropy-weighted similarity and weighted query augmentation.

Baked-in Brilliance: Reranking Meets RL with mxbai-rerank-v2

Mar 13, 2025

Second-generation reranking models using reinforcement learning, supporting 100+ languages with up to 32k token context.

Every Byte Matters: Introducing mxbai-embed-xsmall-v1

Oct 14, 2024

Compact 22.7M parameter embedding model with support for binary quantization and matryoshka embeddings.

Open Source Gets DE-licious: Mixedbread x deepset German/English Embeddings

Jul 18, 2024

Collaboration with deepset producing high-performance German/English embedding models with support for binary quantization/MRL.

ColBERTus Maximus - Introducing mxbai-colbert-large-v1

Mar 19, 2024

ColBERT model for reranking and retrieval, outperforming other ColBERT and cross-encoder models on BEIR.

Fresh 2D-Matryoshka Embedding Model

Mar 15, 2024

Embedding model supporting Matryoshka for both hidden layers and embeddings, enabling flexible 2D dimensionality reduction.

Open Source Strikes Bread - New Fluffy Embedding Model

Mar 8, 2024

State-of-the-art embedding model (mxbai-embed-large-v1) trained on 700M+ data pairs, outperforming OpenAI's text-embedding-v3.

Boost Your Search With The Crispy Mixedbread Rerank Models

Feb 29, 2024

Family of open-source reranking models (xsmall, base, large).

Baking in Performance - Dynamic Batching with Batched

Sep 16, 2024

A library that adds dynamic batching to inference systems, grouping requests/tensors to maximize model throughput.

Getting Better with Baguetter - New Retrieval Testing Framework

Aug 23, 2024

Open-source information retrieval testing framework supporting sparse, dense, and hybrid search with unified benchmarking.