Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»How to Extract YouTube Analytics Data and Analyze in Python

    How to Extract YouTube Analytics Data and Analyze in Python

    March 26, 2025

    If you’re a YouTube content creator, you’ll make data-driven decisions when posting content. This helps you target the right audience when creating your videos.

    YouTube Studio provides YouTube Analytics, where you can get comprehensive data about your channel. But there is a caveat: most of the statistics provided by YouTube Analytics are descriptive and not predictive. This means information like future views, subscriber counts, and factors influencing watch time or earnings are unavailable. This means you’ll need to calculate these metrics yourself.

    In this article, you’ll learn how to export data from YouTube Analytics to Python so you can analyze it further or create visualizations. You can even build your own custom dashboard using various Python libraries like Streamlit, Shiny, or Dash.

    Here’s what we

    • Prerequisites

    • Step 1: Identify the Problem Statement

    • Step 2: Extract the Data

    • Step 3: Analyze the Data in Python

      • Correlation Analysis

      • Audience Retention Analysis

    • Conclusion

    Prerequisites

    • Active YouTube and YouTube Studio Account

    • Jupyter Notebook, Google Colab, Kaggle, or any other environment that supports Python

    • Pandas library installed

    • Seaborn library installed

    • Matplotlib library installed

    Step 1: Identify the Problem Statement

    Before proceeding, we need to know what we’re looking for – because YouTube Analytics has many metrics, and this can get overwhelming. My channel doesn’t have a ton of subscribers, but I have quite a few videos and views. So we’ll use my data as an example.

    Just note that this analysis I’ll conduct in this tutorial is specific to my channel and can vary from channel to channel. You’ll be able to use the techniques here to answer the same/similar questions using your data, but your results will be different from mine.

    Here are the questions I would like to find an answer for:

    1. Correlation Analysis
    • Views and watch time – Are longer watch times associated with higher views?

    • Views and subscribers – Do more views translate to more subscribers?

    • Impressions and Click-Through Rate (CTR%) – Does a stronger impression lead to better engagement?

    • Watch time and average view duration – Are longer videos watched more?

    1. Audience Retention Analysis
    • Average view duration vs. Video length – Are longer videos watched in full?

    • Drop-off points – Which duration range has the best retention?

    • Retention Rate (%) – Watch time divided by duration?

    Step 2: Extract the Data

    Sign in to your YouTube Studio account, go to the Analytics tab, and click Advanced mode.

    Image showing YouTube Analytics Dashboard and the Advanced Mode

    This will open a dashboard showing comprehensive descriptive analytics of your YouTube channel. This can get overwhelming, as there are a lot of metrics and filters with various types of data. This is why I emphasized the importance of knowing your problem and identifying your questions before diving in.

    You can select the range of data you are interested in using the date dropdown (1 in the image below) and the Compare to button (2) to compare data from different date ranges.

    Image showing the date dropdown and the Compare to button

    The column headers you see in the dashboard are the filters. Each contains different metrics, and you can find some metrics in one or more filters. You can play around with the tabs and dropdowns to understand them better.

    This is just a foundation for understanding your YouTube channel performance. If you have a long-running channel with a large number of subscribers and views, trust me – you can get a lot of insights from your data.

    For this tutorial, I will select my entire lifetime data (1) and click the download button at the top right-hand corner (2).

    Image showing the lifetime option under the date dropdown

    This will display two options: whether to open the data in Google Sheets in a new tab or download the CSV file.

    Image showing the download options to open the data in a google sheets new tab or download the csv

    Since we want to use the data in Python, select the option to download the CSV file. After downloading the file, extract the files from the zip folder, and inside the extracted folder, you will see three CSV files: Chart data.csv, Table data.csv, and Totals.csv.

    For this tutorial, we are interested in the Table data.csv. Click the data to open and view it in Excel to do some manual data cleaning before importing the data in Python.

    Image showing the Table data in Excel

    The data is a list of all the videos on my YouTube channel, which is forty (yours might have more or fewer). Remove the first row, which is the Total row, and save the changes.

    Here are the columns in the dataset:

    • Content: The video id

    • Video title: The video title

    • Video publish time: The day the video was published

    • Duration: The video duration in seconds

    • Views: The number of views per video

    • Watch time: The estimated amount of video watch time by your audience in hours

    • Subscribers: Change in total subscribers found by subtracting subscribers lost from subscribers gained for the selected date and region.

    • Average view duration: Estimated average minutes watched per video.

    • Impressions: Number of times your videos were shown to viewers.

    • Impressions click-through rate (%): Number of times viewers clicked your video after seeing an impression.

    Step 3: Analyze the Data in Python

    Go to your Jupyter Notebook and import the Pandas, Seaborn, and Matplotlib libraries.

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    

    Next, import the Table data.csv file.

    # Load data
    df = pd.read_csv("/content/Table data.csv")
    

    Correlation Analysis

    Concerning our problem statement, we are going to plot a correlation heatmap between the following variables: Views, Watch time (hours), Subscribers, Average view duration, and Impressions-click-through rate (%) to see the strength and direction of the relationship between them.

    # Convert "Average view duration" (formatted as H:M:S) to seconds
    df['Average view duration'] = pd.to_timedelta(df['Average view duration']).dt.total_seconds()
    
    # Select relevant columns for correlation analysis
    correlation_data = df[['Views', 'Watch time (hours)', 'Subscribers', 'Average view duration', 'Impressions', 'Impressions click-through rate (%)']]
    
    # Compute correlation matrix
    corr_matrix = correlation_data.corr()
    
    # Visualization using a heatmap
    plt.figure(figsize=(10, 6))
    sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
    plt.title("YouTube Analytics Correlation Heatmap")
    plt.show()
    

    Correlation heatmap showing the relationship between the selected variables

    Correlation coefficient ranges from -1 to 1, where values less than 0 mean a negative relationship, while those above 0 mean a positive relationship. The lower the value in a negative relationship, the stronger the negative relationship, while the higher the value in a positive relationship, the stronger the relationship.

    Based on the plot above, here are the key insights:

    • Views and watch time: There’s a strong correlation (0.94) between views and watch time, suggesting that as videos get more views, they also accumulate more watch hours, proportionally.

    • Views and impressions: There’s a strong correlation (0.89) between views and impressions, indicating that videos that are shown more frequently in recommendations and search results tend to get more views.

    • Average view duration: This metric has very weak correlations with almost all other metrics. It is particularly notable in views (0.06), subscribers (0.01), and impressions (0.03).

    • Subscribers and metrics: Subscribers have a moderate to strong correlation with views (0.75) and impressions (0.79) and a weaker correlation with click-through rate (0.54).

    • Click-through rate: Has moderate correlations with views (0.69) and watch time (0.66) but a weaker correlation with subscribers (0.54).

    The most significant insight is that average view duration appears to operate independently from other metrics. This suggests that on my YouTube channel, a video’s ability to retain viewers throughout its length isn’t necessarily connected to how many people watch it, how often it’s recommended, or how many subscribers the channel has.

    This implies that the strategies I would implement to increase my views, subscribers, and impressions might differ from those needed to improve average view duration, an important factor in YouTube’s recommendation algorithm. This means I need to look at other YouTube metrics that have a relationship with average view duration, which is a topic for another article.

    Audience Retention Analysis

    To analyze audience retention, we need to create a new variable Retention Rate (%), which is calculated by dividing a video’s Average view duration by the Duration and expressing it as a percentage.

    
    # Calculate retention rate as (Average View Duration / Total Video Duration) * 100
    df['Retention Rate (%)'] = (df['Average view duration'] / df['Duration']) * 100
    

    Next is to sort the videos in ascending order based on Retention Rate (%) and display the top 10 videos with the highest retention rate.

    # Sort videos by retention rate
    df_sorted = df.sort_values(by='Retention Rate (%)', ascending=False)
    
    # Display top 10 videos with highest retention
    df_sorted[['Video title', 'Duration', 'Average view duration', 'Retention Rate (%)']].head(10)
    

    Image showing top ten videos by retention rate

    From the table above, you will notice that most of the videos in the top 10 spot are not above 503 seconds, which is approximately 8 minutes. This implies that my audience are interested in short, mid-range videos.

    Most videos with the high retention rate have a duration less than 4 minutes, with a retention rate ranging from 27% – 40%. With this insight, I can ensure that the next videos I will upload are within 5 to 8 minutes.

    Let’s take a look at the bottom 10 videos with a low retention rate:

    # Sort videos by retention rate
    df_sorted = df.sort_values(by='Retention Rate (%)', ascending=False)
    
    # Display bottom 10 videos with highest retention
    df_sorted[['Video title', 'Duration', 'Average view duration', 'Retention Rate (%)']].tail(10)
    

    Image showing bottom ten videos by retention rate

    From the above information, you will notice that long videos in my channel spanning approximately 22 – 58 minutes have a low retention rate. This further supports the claim above that my audience is more interested in shorter videos.

    We can further decide to plot a scattered plot of Duration against Retention Rate (%) to summarize the above tables.

    # Set style for plots
    sns.set_style("whitegrid")
    
    # Plot Retention Rate vs. Video Duration
    plt.figure(figsize=(12, 6))
    
    sns.scatterplot(data=df, x='Duration', y='Retention Rate (%)', hue='Views', size='Views', sizes=(20, 200), palette='coolwarm')
    plt.title("Audience Retention vs. Video Duration")
    plt.xlabel("Video Duration (seconds)")
    plt.ylabel("Retention Rate (%)")
    plt.legend(title="Views", loc="upper right")
    
    plt.show()
    

    Scatter plot showing audience retention against video duration

    The scatter plot above shows the relationship between audience retention rate (y-axis, measured as a percentage) and video duration (x-axis, measured in seconds) for various videos. Here are the following key observations:

    • There’s a clear negative correlation between video duration and retention rate – as videos get longer, the retention rate generally decreases.

    • The highest retention rates (35-40%) are found in shorter videos, mostly under 500 seconds (around 8 minutes).

    • Videos over 1500 seconds (25 minutes) consistently show retention rates below 15%.

    • The size and color of the dots represent the number of views, with larger, redder dots indicating more views (up to 1000) and smaller, blue dots representing fewer views (around 200).

    • Interestingly, some mid-length videos (around 500 seconds) have both higher view counts (indicated by larger red dots) and decent retention rates of about 25%.

    • The longest video in the dataset (at around 3500 seconds or 58 minutes) has a retention rate of about 14% and relatively few views.

    This plot further confirms the claim that shorter videos tend to better maintain audience attention on my channel, though some mid-length videos can still perform well in terms of both retention and view count.

    Conclusion

    What we’ve learned from my data is just the tip of the iceberg. YouTube has many metrics, and because my channel is not monetized and has few subscribers and videos, I don’t have data on monetization, demographics, and other metrics.

    But after reading this article, I hope that you can think of endless information you want to get based on these metrics. You can even forecast your views, subscriber counts, and revenue for the next days or months. You can also perform a multivariate time series analysis to see how these factors affect your primary variable of interest.

    If you find this article interesting, don’t forget to check out my blog for other interesting articles, follow me on Medium, connect on LinkedIn, and subscribe to my YouTube channel.

    Source: freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleWhat is SRE? A Beginner’s Guide to Site Reliability Engineering
    Next Article Can I play Atomfall on Steam Deck, ROG Ally, and other gaming handhelds?

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2024-47893 – VMware GPU Firmware Memory Disclosure

    May 17, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    CVE-2025-4476 – Apache Libsoup Denial of Service

    Common Vulnerabilities and Exposures (CVEs)

    AT&T is already discounting the Samsung Galaxy S25 Edge by $1,100 off – how it works

    News & Updates

    Liquid AI Introduces STAR: An AI Framework for the Automated Evolution of Tailored Architectures

    Development

    3 reasons not to buy an RTX 5090 … and 2 reasons to buy at launch

    News & Updates

    Highlights

    Samsung Unpacked 2025: How to watch and what to expect

    January 14, 2025

    At next week’s event, we’re looking forward to Samsung’s Galaxy S25 lineup, new AI features,…

    What is Operating System: Exploring the Building Blocks of Computer Science

    April 21, 2024

    Exploring Industry Shifts in Banking Compliance at XLoD

    May 30, 2024

    Norma Kamali is transforming the future of fashion with AI

    April 22, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.