Introduction

This side project combines two passions: AI application development and Brazilian music and dance. The goal was to build an AI assistant that lets people find music through natural conversation, while learning how to create user-facing AI applications beyond the analytics workflows used in professional settings.

Finding good music is still surprisingly hard. Most people know that feeling of wanting something specific, “chill background music for studying” or “upbeat tracks for a Saturday night”, but scrolling through endless playlists doesn’t deliver what you’re actually looking for.

Traditional music apps rely on filters and categories, but people think about music in terms of context and feeling. They want to describe what they need and get relevant recommendations, similar to asking a knowledgeable friend who understands both your taste and the situation.

The Brazilian music community, particularly around Zouk and related genres, has a rich culture of DJs sharing sets on platforms like SoundCloud. These sets often have detailed descriptions, tags, and metadata that capture the mood and style—perfect data for training an AI system to understand musical context and preferences.

The Application

The interface is straightforward. Users type requests like “Find me some chill Zouk sets for a relaxed evening” and receive specific recommendations with DJ details, and listening links.

The system uses semantic understanding rather than keyword matching. It connects “chill” with “relaxed” and “evening” with “late-night vibes,” enabling contextual interpretation that makes conversations feel natural rather than robotic.

Conversation memory allows for follow-up queries. When someone asks for “something similar but with more vocals,” the system references previous recommendations to understand what “similar” means in that specific context.

# Technical Implementation

The system uses Retrieval Augmented Generation (RAG) with access to a database that searches through information to answer questions.

The process:

Scrape data (from Mixcloud) about DJ sets, descriptions, tags, play counts, upload dates. This data is then converted to a Chroma vectorsore.
Find the most relevant sets based on semantic similarity to user queries.
An LLM (gpt-4o-mini) processes that information and generates natural language responses.

[Code snippet placeholder: Simple flow diagram showing the process]

The implementation uses LangChain and LangGraph to create AI agents that use tools for step-by-step problem solving. When users ask questions, the AI decides whether to search for information or respond directly. For search requests, it calls a custom retrieval tool that queries the ChromaDB vector database and returns relevant music sets. The AI then processes those results into natural responses.

Tool-calling enables intelligent decision making. Simple greetings don’t trigger database searches, but “find me some chill tracks” immediately activates the retrieval tool with optimized search parameters.

ChromaDB handles vector embeddings and local data persistence. LangGraph orchestrates multi-step reasoning, deciding when to search, how to search, and how to respond. Streamlit provides the web interface.

Implementation Challenges

The primary technical challenge was maintaining context across conversation turns. When someone asks “What about something more upbeat?” after discussing chill music, the system must understand that “upbeat” is relative to the previous conversation context.

The solution involves maintaining conversation memory and feeding recent chat history into the search process. This approach isn’t perfect but significantly improves interaction naturalness.

User experience requirements included authentication for personalized sessions, error handling for system failures, and token usage tracking for performance monitoring.

Learning Outcomes

This project required skills beyond typical analytics work. Multi-agent AI systems, production deployment, and user interface design. However, core skills transferred effectively. Both analytics and music discovery involve processing unstructured text data and making it useful for users. The primary difference is the interface—conversation rather than dashboards and reports.

Product thinking became crucial when building user-facing applications. Technical correctness alone is insufficient. Error states, edge cases, and user frustration when AI fails to understand requests all require careful consideration.

The music domain provided ideal conditions for experimentation:

Music preferences are subjective—no single “correct” answer exists.
People naturally discuss and discover music through conversation.
The data is rich but messy, reflecting most real-world datasets.

Next Steps

The foundation works effectively, but several improvements are planned. - Expanding the database with additional genres and artists. - Learning from user interactions to enhance recommendations over time - Incorporating audio analysis to complement text-based search.

--- title: "AI Zouk Music Recommender" subtitle: "Using RAG and Vector Search to Recommend Zouk Music Sets" date: 12/15/2023 author: Lucas Okwudishu title-block-banner: true format: html: css: styles.css toc: true toc-depth: 2 number-sections: false theme: flatly code-fold: false code-tools: true page-layout: article link-citations: true categories: [AI, RAG, 'ChromaDB', 'Vectorstore'] image: "workflow.png" --- # Introduction This side project combines two passions: AI application development and Brazilian music and dance. The goal was to build an AI assistant that lets people find music through natural conversation, while learning how to create user-facing AI applications beyond the analytics workflows used in professional settings. Finding good music is still surprisingly hard. Most people know that feeling of wanting something specific, "chill background music for studying" or "upbeat tracks for a Saturday night", but scrolling through endless playlists doesn't deliver what you're actually looking for. Traditional music apps rely on filters and categories, but people think about music in terms of context and feeling. They want to describe what they need and get relevant recommendations, similar to asking a knowledgeable friend who understands both your taste and the situation. The Brazilian music community, particularly around Zouk and related genres, has a rich culture of DJs sharing sets on platforms like SoundCloud. These sets often have detailed descriptions, tags, and metadata that capture the mood and style—perfect data for training an AI system to understand musical context and preferences. # The Application The interface is straightforward. Users type requests like "Find me some chill Zouk sets for a relaxed evening" and receive specific recommendations with DJ details, and listening links. The system uses semantic understanding rather than keyword matching. It connects "chill" with "relaxed" and "evening" with "late-night vibes," enabling contextual interpretation that makes conversations feel natural rather than robotic. Conversation memory allows for follow-up queries. When someone asks for "something similar but with more vocals," the system references previous recommendations to understand what "similar" means in that specific context. # Technical Implementation The system uses Retrieval Augmented Generation (RAG) with access to a database that searches through information to answer questions. The process: 1. Scrape data (from [Mixcloud](https://www.mixcloud.com)) about DJ sets, descriptions, tags, play counts, upload dates. This data is then converted to a Chroma vectorsore. 2. Find the most relevant sets based on semantic similarity to user queries. 3. An LLM (gpt-4o-mini) processes that information and generates natural language responses. [Code snippet placeholder: Simple flow diagram showing the process] The implementation uses LangChain and LangGraph to create AI agents that use tools for step-by-step problem solving. When users ask questions, the AI decides whether to search for information or respond directly. For search requests, it calls a custom retrieval tool that queries the ChromaDB vector database and returns relevant music sets. The AI then processes those results into natural responses. Tool-calling enables intelligent decision making. Simple greetings don't trigger database searches, but "find me some chill tracks" immediately activates the retrieval tool with optimized search parameters. ![](png/graph_architecture.png.png){width=70% fig-align="center"} ChromaDB handles vector embeddings and local data persistence. LangGraph orchestrates multi-step reasoning, deciding when to search, how to search, and how to respond. Streamlit provides the web interface. # Implementation Challenges The primary technical challenge was maintaining context across conversation turns. When someone asks "What about something more upbeat?" after discussing chill music, the system must understand that "upbeat" is relative to the previous conversation context. The solution involves maintaining conversation memory and feeding recent chat history into the search process. This approach isn't perfect but significantly improves interaction naturalness. ![](png/conversation_with_memory.png){width=70% fig-align="center"}  User experience requirements included authentication for personalized sessions, error handling for system failures, and token usage tracking for performance monitoring. ![](png/log_in_page.png){width=70% fig-align="center"} # Learning Outcomes This project required skills beyond typical analytics work. Multi-agent AI systems, production deployment, and user interface design. However, core skills transferred effectively. Both analytics and music discovery involve processing unstructured text data and making it useful for users. The primary difference is the interface—conversation rather than dashboards and reports. Product thinking became crucial when building user-facing applications. Technical correctness alone is insufficient. Error states, edge cases, and user frustration when AI fails to understand requests all require careful consideration. The music domain provided ideal conditions for experimentation: - Music preferences are subjective—no single "correct" answer exists. - People naturally discuss and discover music through conversation. - The data is rich but messy, reflecting most real-world datasets. # Next Steps The foundation works effectively, but several improvements are planned. - Expanding the database with additional genres and artists. - Learning from user interactions to enhance recommendations over time - Incorporating audio analysis to complement text-based search.