Talent.com
Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

MindriftMuar, Johor, Malaysia
10 hours ago
Job description

Mindrift is looking for a freelance Agent Scenarios Designer based in the specified country. The role focuses on designing realistic and structured evaluation scenarios for LLM‑based agents, testing agent outputs, and refining tests. You will work on a flexible schedule and receive pay up to $38 / hr based on experience.

What We Do

The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting‑edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real‑world expertise from across the globe.

About the Role

You will design realistic and structured evaluation scenarios, create test cases that simulate human‑performed tasks, and define gold‑standard behavior to compare agent actions against. Your work will ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse. You need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Responsibilities

  • Design structured test scenarios based on real‑world tasks
  • Define the golden path and acceptable agent behavior
  • Annotate task steps, expected outputs, and edge cases
  • Work with developers to test scenarios and improve clarity
  • Review agent outputs and adapt tests accordingly

How to Get Started

Apply to this posting, qualify, and you’ll have the chance to contribute to projects aligned with your skills on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor’s and / or Master’s degree in Computer Science, Software Engineering, Data Science / Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / NLP, Information Systems or related fields
  • Background in QA, software testing, data analysis, or NLP annotation
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)
  • Strong written communication skills in English
  • Comfortable with structured formats like JSON / YAML for scenario description
  • Can define expected agent behaviors (gold paths) and scoring logic
  • Basic experience with Python and JavaScript
  • Curious and open to working with AI‑generated content, agent logs, and prompt‑based behavior
  • Ready to learn new methods, able to switch between tasks and topics quickly, and sometimes work with challenging, complex guidelines
  • Fully remote freelance role – only requires a laptop, internet connection, available time, and enthusiasm to take on a challenge
  • Nice to Have

  • Experience in writing manual or automated test cases
  • Familiarity with LLM capabilities and typical failure modes
  • Understanding of scoring metrics (precision, recall, coverage, reward functions)
  • Benefits

  • Get paid for your expertise, with rates up to $38 / hr depending on your skills, experience, and project needs
  • Participate in a flexible, remote, freelance project that fits around your primary professional or academic commitments
  • Gain valuable experience to enhance your portfolio through an advanced AI project
  • Influence how future AI models understand and communicate in your field of expertise
  • #J-18808-Ljbffr

    Create a job alert for this search

    Evaluation Writer Ai • Muar, Johor, Malaysia

    Related jobs
    • Promoted
    • New!
    Freelance Medical Content Writer

    Freelance Medical Content Writer

    DigitalBatu Pahat, Johor, Malaysia
    Hire Digital is seeking a remote.Freelance Medical Content Writer.The ideal candidate has a solid background in medical or pharmaceutical writing and can translate complex topics into clear and eng...Show moreLast updated: 10 hours ago
    • Promoted
    Part Time Marketing Associate (Video Editing)

    Part Time Marketing Associate (Video Editing)

    Ordinary FolkMuar, Johor, Malaysia
    Part Time Marketing Associate (Video Editing).We’re a digital health startup with a bold vision : a future where healthcare is personalised, accessible, and affordable for everyone.In just five year...Show moreLast updated: 23 hours ago
    • Promoted
    Interpreter - Cantonese - Work-From-Home (Australia)

    Interpreter - Cantonese - Work-From-Home (Australia)

    TeleperformanceMuar, Johor, Malaysia
    Excellent listening, retention and note taking skills to maintain a high level of accuracy.Ability to concentrate, stay detached from the conversation, and always remain neutral and objective.Abili...Show moreLast updated: 30+ days ago
    • Promoted
    Talent Acquisition Specialist

    Talent Acquisition Specialist

    Flowmingo AIMuar, Johor, Malaysia
    Talent Acquisition Specialist – A role in Flowmingo Partner Program.Flowmingo is a Y Combinator‑backed AI platform that's redefining how companies hire. We’re an AI‑powered interview platform helpin...Show moreLast updated: 28 days ago
    • Promoted
    Wolof Language Specialist - AI Trainer

    Wolof Language Specialist - AI Trainer

    Invisible Expert MarketplaceMuar, Johor, Malaysia
    Wolof Language Specialist – AI Trainer.AI models for Wolof speakers worldwide.Review and annotate Wolof content for training datasets. Evaluate AI-generated outputs for accuracy, fluency, and cultur...Show moreLast updated: 29 days ago
    • Promoted
    Freelance Electrical Engineer - AI Trainer

    Freelance Electrical Engineer - AI Trainer

    MindriftBatu Pahat, Johor, Malaysia
    This opportunity is limited to candidates residing in the specified country.Location may affect eligibility and rates.Please submit your resume in English and indicate your level of English.At Mind...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Enterprise AI Customer Success Manager

    Enterprise AI Customer Success Manager

    Wallaroo.AIBatu Pahat, Johor, Malaysia
    Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.Wanted to work on the next wave of AI? Join Wallaroo. AI and help us build the next generation of enter...Show moreLast updated: 10 hours ago
    • Promoted
    Search Engine Optimization Executive

    Search Engine Optimization Executive

    Hypercharge Digital ServicesBatu Pahat, Johor, Malaysia
    Hypercharge is an SEO agency that helps local Malaysian businesses grow through practical, data‑driven SEO strategies.We don’t chase vanity metrics — we focus on what really moves the needle : leads...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Principal Engineer Test Technology and Innovation

    Principal Engineer Test Technology and Innovation

    Infineon TechnologiesMalacca City, Malacca, Malaysia
    Principal Engineer Test Technology and Innovation.Principal Engineer Test Technology and Innovation.This role involves defining and developing optimal test solutions and handling capabilities for n...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    AI Evaluator - Cantonese (Chinese) - Malaysia

    AI Evaluator - Cantonese (Chinese) - Malaysia

    CrowdGen by AppenBatu Pahat, Johor, Malaysia
    Join CrowdGen as we launch an exciting new AI Voice Interaction Project designed to help improve the way voice assistants understand and respond to users! We’re looking for detail-oriented contribu...Show moreLast updated: 10 hours ago
    • Promoted
    • New!
    Freelance Data Annotator with Japanese - AI Trainer

    Freelance Data Annotator with Japanese - AI Trainer

    Toloka AnnotatorsMalacca City, Malacca, Malaysia
    Freelance Data Annotator (Japanese) – AI Trainer.Get AI-powered advice on this job and more exclusive features.This opportunity is only for candidates currently residing in the specified country.Yo...Show moreLast updated: 10 hours ago
    Manager- Research & Development - Food Industry

    Manager- Research & Development - Food Industry

    Two95 International Inc.Batu Pahat, Johor, MY
    Quick Apply
    Industry Background : Food Industry, with Product Development experience and leadership to manage a team.Candidates : Open to Assistant Manager (AM) level candidates seeking career enhancement.Initia...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Evaluation Scenario Writer - AI Agent Testing Specialist

    Evaluation Scenario Writer - AI Agent Testing Specialist

    MindriftMalacca City, Malacca, Malaysia
    Mindrift is looking for a freelance.The role focuses on designing realistic and structured evaluation scenarios for LLM‑based agents, testing agent outputs, and refining tests.You will work on a fl...Show moreLast updated: 10 hours ago
    • Promoted
    • New!
    Project Engineer

    Project Engineer

    LINSUN ENGINEERING SDN BHDKebun Baharu, Johor, Malaysia
    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from LINSUN ENGINEERING SDN BHD. Scaffolding Access Systems | Falsework & Shoring Specialist | Rotary Char...Show moreLast updated: 10 hours ago
    • Promoted
    • New!
    B2B EDI Solution Analyst (Remote)

    B2B EDI Solution Analyst (Remote)

    HCLTechBatu Pahat, Johor, Malaysia
    Human Resource | Talent Acquisition | Recruitment | Talent Sourcing.As a B2B Solution Analyst, you will be part of the Client’s B2B team responsible for enabling seamless data exchange between clie...Show moreLast updated: 10 hours ago
    • Promoted
    Real Estate Agent

    Real Estate Agent

    MegaHarta Real EstateBatu Pahat, Johor, Malaysia
    MegaHarta Real Estate Group, established in March 2002, is a leading property agency based in Kuala Lumpur and Petaling Jaya, Malaysia. As a licensed agency registered with The Board of Valuers, App...Show moreLast updated: 30+ days ago
    • Promoted
    Medical Interpreter

    Medical Interpreter

    i-Call InternationalBatu Pahat, Johor, Malaysia
    Human Resources Assistant @ i-Call International | BBA Graduate.I-Call International is looking to hire more Japanese Interpreters to work remotely. As a Japanese consecutive medical interpreter, yo...Show moreLast updated: 1 day ago
    • Promoted
    Technical Author (multiple roles and seniority levels)

    Technical Author (multiple roles and seniority levels)

    CanonicalBatu Pahat, Johor, Malaysia
    Technical Author (multiple roles and seniority levels).Canonical Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia. Join or sign in to find your next job.Technical Author (multiple roles and...Show moreLast updated: 30+ days ago