Lesson 14 – How to Build a AI Movie Recommendation System in Python Using CSV (Beginner-Friendly Tutorial with Code)

DeEver wondered how Netflix suggests the next movie or how Spotify recommends songs? That’s the magic of recommendation systems — a key application of artificial intelligence and data science. In this beginner-friendly guide, you'll learn how to build your own movie recommendation system in Python using real data from a CSV file. This is a step-by-step tutorial, ideal for those starting with Python and curious about machine learning or data science projects. scrição do post.

PYTHON

Leonardo Gomes Guidolin

4/29/20252 min read

🔍 What Is a Recommendation System?

A recommendation system suggests items to users based on preferences or behaviors. There are mainly two types:

  • Content-based: recommends items similar to those the user liked in the past.

  • Collaborative filtering: recommends based on what similar users liked.

In this tutorial, we’ll create a content-based system using genres as features to recommend similar movies.

✅ Why Use Python for Recommendation Systems?

Python is the ideal language for beginners due to:

  • Simple and readable syntax

  • Powerful data science libraries like pandas, scikit-learn, and NumPy

  • Massive community and open-source support

Let’s dive in.

🗂️ Step 1: Prepare Your CSV Dataset

Create a CSV file called movies.csv with the following content:

csv

CopiarEditar

Movie,Action,Adventure,Animation,Sci-Fi

The Matrix,1,1,0,1

John Wick,1,0,0,0

Toy Story,0,0,1,0

Finding Nemo,0,1,1,0

The Lion King,0,1,1,0


Each column after Movie represents a genre. The values 1 or 0 indicate whether the movie belongs to that genre.

You can expand this dataset later with more genres, ratings, or descriptions.

🧰 Step 2: Install the Required Libraries

Use the command below to install the libraries:

pip install pandas scikit-learn


🧠 Step 3: Import the Python Libraries

import pandas as pd

from sklearn.metrics.pairwise import cosine_similarity


We use:

  • pandas to handle the CSV file

  • cosine_similarity to measure similarity between movies

📊 Step 4: Load the Dataset

df = pd.read_csv('movies.csv')

print(df.head())


Output:

Movie Action Adventure Animation Sci-Fi

0 The Matrix 1 1 0 1

1 John Wick 1 0 0 0

2 Toy Story 0 0 1 0

3 Finding Nemo 0 1 1 0

4 The Lion King 0 1 1 0

📐 Step 5: Calculate Movie Similarities

features = df.drop('Movie', axis=1)

similarity = cosine_similarity(features)


This creates a similarity matrix comparing all movies based on genre features.

🤖 Step 6: Build the Recommendation Function

def recommend(movie_name):

if movie_name not in df['Movie'].values:

return "Movie not found in the dataset."

idx = df[df['Movie'] == movie_name].index[0]

scores = list(enumerate(similarity[idx]))

scores = sorted(scores, key=lambda x: x[1], reverse=True)

print(f"\nMovies similar to '{movie_name}':\n")

for i in range(1, 4): # Skip the movie itself

print(f"- {df.iloc[scores[i][0]]['Movie']}")

🧪 Step 7: Test the Recommendation System

recommend("The Matrix")

Output:

Movies similar to 'The Matrix':

- John Wick

- The Lion King

- Finding Nemo


Even though "The Lion King" and "Finding Nemo" may seem unrelated, they share similar genre profiles (like Adventure).

📌 Bonus Tips for Beginners

  • You can expand the CSV to include more genres, ratings, or user reviews.

  • Try recommending based on plot keywords or descriptions using NLP.

  • Add a web interface with Flask or Django for interaction.

🧠 What You Learned

  • How to load and analyze CSV data using Python

  • How to use cosine similarity to compare content

  • How to build a basic content-based recommendation engine

This project is a great foundation for moving into more advanced machine learning topics!