Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 29, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 29, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 29, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 29, 2025

      Gemini can now watch Google Drive videos for you – including work meetings

      May 29, 2025

      LG is still giving away a free 27-inch gaming monitor, but you’ll have to hurry

      May 29, 2025

      Slow Roku TV? This 30-second fix made my system run like new again

      May 29, 2025

      Hume’s new EVI 3 model lets you customize AI voices – how to try it

      May 29, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Your Agentforce Readiness Assessment

      May 29, 2025
      Recent

      Your Agentforce Readiness Assessment

      May 29, 2025

      Introducing N|Sentinel: Your AI-Powered Agent for Node.js Performance Optimization

      May 29, 2025

      FoalTS framework – version 5 is released

      May 29, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

      May 29, 2025
      Recent

      KB5058499 finally makes Windows 11 24H2 stable for gaming, and it wasn’t Nvidia’s fault

      May 29, 2025

      Transform Your Workflow With These 10 Essential Yet Overlooked Linux Tools You Need to Try

      May 29, 2025

      KNOPPIX is a bootable Live system

      May 29, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Code Implementation of Using Atla’s Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance

    A Code Implementation of Using Atla’s Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance

    March 31, 2025

    In this tutorial, we demonstrate how to evaluate the quality of LLM-generated responses using Atla’s Python SDK, a powerful tool for automating evaluation workflows with natural language criteria. Powered by Selene, Atla’s state-of-the-art evaluator model, we analyze whether legal responses align with the principles of the GDPR (General Data Protection Regulation). Atla‘s platform enables programmatic assessments using custom or predefined criteria with synchronous and asynchronous support via the official Atla SDK.

    In this implementation, we did the following:

    • Used custom GDPR evaluation logic
    • Queried Selene to return binary scores (0 or 1) and human-readable critiques
    • Processed the evaluation in batch using asyncio
    • Printed critiques to understand the reasoning behind each judgment

    The Colab-compatible setup requires minimal dependencies, primarily the atla SDK, pandas, and nest_asyncio.

    Copy CodeCopiedUse a different Browser
    !pip install atla pandas matplotlib nest_asyncio --quiet
    
    
    import os
    import nest_asyncio
    import asyncio
    import pandas as pd
    from atla import Atla, AsyncAtla
    
    
    ATLA_API_KEY = "your atla API key"
    client = Atla(api_key=ATLA_API_KEY)
    async_client = AsyncAtla(api_key=ATLA_API_KEY)
    nest_asyncio.apply()
    

    First, we install required libraries and initialize synchronous and asynchronous Atla clients using your API key. nest_asyncio is applied to allow asynchronous code to run smoothly within a Jupyter or Colab notebook environment. This enables seamless integration with Atla’s async evaluation API via the AsyncAtla client.

    Copy CodeCopiedUse a different Browser
    data = [
        {
            "question": "Can a company monitor employee emails under GDPR?",
            "llm_response": "Yes, any employer can freely monitor emails as long as it's for productivity.",
            "expected": 0
        },
        {
            "question": "Can employers access private chats on company devices?",
            "llm_response": "Only if there is a legitimate business need and employees are informed.",
            "expected": 1
        },
        {
            "question": "Can browsing history be stored under EU privacy law?",
            "llm_response": "Yes, but consent and transparency are required.",
            "expected": 1
        },
        {
            "question": "Can employers check WhatsApp messages on personal phones?",
            "llm_response": "No, personal device content is protected unless explicitly authorized.",
            "expected": 1
        },
    ]
    
    
    df = pd.DataFrame(data)
    df.head()
    

    We define a small dataset of legal questions and LLM-generated responses related to GDPR compliance. Each entry includes an expected binary label (1 for compliant, 0 for non-compliant). The data is loaded into a Pandas DataFrame for easy processing and evaluation.

    Copy CodeCopiedUse a different Browser
    custom_eval_criteria = """
    Score this 1 if the response complies with GDPR principles:
    - lawful basis
    - employee consent or notice
    - data minimization
    - legitimate interest
    Otherwise, score it 0.
    
    
    Explain briefly why it qualifies or not.
    """
    

    We define a custom evaluation prompt that guides Atla’s Selene model in scoring responses based on key GDPR principles. It instructs the model to assign a score of 1 for compliant answers and 0 otherwise, along with a brief explanation justifying the score.

    Copy CodeCopiedUse a different Browser
    async def evaluate_with_selene(df):
        async def evaluate_row(row):
            try:
                result = await async_client.evaluation.create(
                    model_id="atla-selene",
                    model_input=row["question"],
                    model_output=row["llm_response"],
                    evaluation_criteria=custom_eval_criteria,
                )
                return result.result.evaluation.score, result.result.evaluation.critique
            except Exception as e:
                return None, f"Error: {e}"
    
    
        tasks = [evaluate_row(row) for _, row in df.iterrows()]
        results = await asyncio.gather(*tasks)
    
    
        df["selene_score"], df["critique"] = zip(*results)
        return df
    
    
    df = asyncio.run(evaluate_with_selene(df))
    df.head()
    

    Here, this asynchronous function evaluates each row in the DataFrame using Atla’s Selene model. It submits the data along with the custom GDPR evaluation criteria for each legal question and LLM response pair. It then gathers scores and critiques concurrently using asyncio.gather, appends them to the DataFrame, and returns the enriched results.

    Copy CodeCopiedUse a different Browser
    for i, row in df.iterrows():
        print(f"n🔹 Q: {row['question']}")
        print(f"🤖 A: {row['llm_response']}")
        print(f"🧠 Selene: {row['critique']} — Score: {row['selene_score']}")

    We iterate through the evaluated DataFrame and print each question, the corresponding LLM-generated answer, and Selene’s critique with its assigned score. It provides a clear, human-readable summary of how the evaluator judged each response based on the custom GDPR criteria.

    In conclusion, this notebook demonstrated how to leverage Atla’s evaluation capabilities to assess the quality of LLM-generated legal responses with precision and flexibility. Using the Atla Python SDK and its Selene evaluator, we defined custom GDPR-specific evaluation criteria and automated the scoring of AI outputs with interpretable critiques. The process was asynchronous, lightweight, and designed to run seamlessly in Google Colab.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

    The post A Code Implementation of Using Atla’s Evaluation Platform and Selene Model via Python SDK to Score Legal Domain LLM Outputs for GDPR Compliance appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow is AI Transforming Customer Experience in Banking
    Next Article VideoMind: A Role-Based Agent for Temporal-Grounded Video Understanding

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    May 29, 2025
    Machine Learning

    Real-world applications of Amazon Nova Canvas for interior design and product photography

    May 29, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    New RISC-V AI PC Delivers 50 TOPS, Runs Ubuntu 24.04

    Linux

    A drug developer is buying 23andMe – what does that mean for your DNA data?

    News & Updates

    Central Bank of Iran Under Siege: Massive Cyberattack Disrupts Banking

    Development

    PowerToys brings “Command Palette” to Windows 11 as a new launcher experience

    Operating Systems
    Hostinger

    Highlights

    ShrinkLocker ransomware: what you need to know

    November 15, 2024

    ShrinkLocker is a family of ransomware that encrypts an organisation’s data and demands a ransom…

    Tariff war has tech buyers wondering what’s next. Here’s what we know

    April 7, 2025

    DarkWatchman, Sheriff Malware Hit Russia and Ukraine with Stealth and Nation-Grade Tactics

    May 1, 2025

    PINE: Efficient Norm-Bound Verification for Secret-Shared Vectors

    July 26, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.