Talent.com
Evaluation Scenario Writer - AI Agent Testing Specialist
Evaluation Scenario Writer - AI Agent Testing SpecialistMindrift • Kuantan, Pahang, Malaysia
Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • Kuantan, Pahang, Malaysia
2 hari lalu
Penerangan pekerjaan

Mindrift is looking for a freelance Agent Scenarios Designer based in the specified country. The role focuses on designing realistic and structured evaluation scenarios for LLM‑based agents, testing agent outputs, and refining tests. You will work on a flexible schedule and receive pay up to $38 / hr based on experience.

What We Do

The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting‑edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real‑world expertise from across the globe.

About the Role

You will design realistic and structured evaluation scenarios, create test cases that simulate human‑performed tasks, and define gold‑standard behavior to compare agent actions against. Your work will ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse. You need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Responsibilities

  • Design structured test scenarios based on real‑world tasks
  • Define the golden path and acceptable agent behavior
  • Annotate task steps, expected outputs, and edge cases
  • Work with developers to test scenarios and improve clarity
  • Review agent outputs and adapt tests accordingly

How to Get Started

Apply to this posting, qualify, and you’ll have the chance to contribute to projects aligned with your skills on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor’s and / or Master’s degree in Computer Science, Software Engineering, Data Science / Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / NLP, Information Systems or related fields
  • Background in QA, software testing, data analysis, or NLP annotation
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)
  • Strong written communication skills in English
  • Comfortable with structured formats like JSON / YAML for scenario description
  • Can define expected agent behaviors (gold paths) and scoring logic
  • Basic experience with Python and JavaScript
  • Curious and open to working with AI‑generated content, agent logs, and prompt‑based behavior
  • Ready to learn new methods, able to switch between tasks and topics quickly, and sometimes work with challenging, complex guidelines
  • Fully remote freelance role – only requires a laptop, internet connection, available time, and enthusiasm to take on a challenge
  • Nice to Have

  • Experience in writing manual or automated test cases
  • Familiarity with LLM capabilities and typical failure modes
  • Understanding of scoring metrics (precision, recall, coverage, reward functions)
  • Benefits

  • Get paid for your expertise, with rates up to $38 / hr depending on your skills, experience, and project needs
  • Participate in a flexible, remote, freelance project that fits around your primary professional or academic commitments
  • Gain valuable experience to enhance your portfolio through an advanced AI project
  • Influence how future AI models understand and communicate in your field of expertise
  • #J-18808-Ljbffr

    Buat amaran kerja untuk carian ini

    Evaluation Writer Ai • Kuantan, Pahang, Malaysia

    Pekerjaan berkaitan
    Real Estate Agent

    Real Estate Agent

    MegaHarta Real Estate • Kuantan, Pahang, Malaysia
    MegaHarta Real Estate Group, established in March 2002, is a leading property agency based in Kuala Lumpur and Petaling Jaya, Malaysia. As a licensed agency registered with The Board of Valuers, App...Tunjukkan lagi
    Kemas kini terakhir: 30+ hari yang lalu • Dinaikkan pangkat
    Project Manager - Remote, Cantonese Speaker ( Mobile Apps / Web / AI Solution)

    Project Manager - Remote, Cantonese Speaker ( Mobile Apps / Web / AI Solution)

    REDSO INNOVATION SDN. BHD. • Kuantan, Pahang, Malaysia
    This is an exciting opportunity to join REDSO INNOVATION SDN.Project Manager (Mobile Apps / Web / AI Solution).In this full-time fully remote role, you will be responsible for leading the successfu...Tunjukkan lagi
    Kemas kini terakhir: 30+ hari yang lalu • Dinaikkan pangkat
    Bilingual Content Editor

    Bilingual Content Editor

    DataAnnotation • Kuantan, Pahang, Malaysia
    We are looking for a bilingual Content Editor to join our team to train AI models.You will measure the progress of these AI chatbots, evaluate their logic, and solve problems to improve the quality...Tunjukkan lagi
    Kemas kini terakhir: 30+ hari yang lalu • Dinaikkan pangkat
    Organisational Data Senior Analyst

    Organisational Data Senior Analyst

    BP PLC • Kuantan, Pahang, Malaysia
    Organisational Data Senior Analyst page is loaded## Organisational Data Senior Analystremote type : This position is open to working from any bp locationlocations : Malaysia - Kuala Lumpurtime ty...Tunjukkan lagi
    Kemas kini terakhir: 3 jam yang lalu • Dinaikkan pangkat • Baharu!
    AI Data Specialist - Chinese

    AI Data Specialist - Chinese

    RWS Group • Kuantan, Pahang, Malaysia
    AI Data Specialist - Chinese (Remote).We are looking for an AI Data Specialist to support the improvement of AI-generated content in English. This is a freelance, part‑time role based remotely with ...Tunjukkan lagi
    Kemas kini terakhir: 18 hari yang lalu • Dinaikkan pangkat
    Workflow and Informatics Specialist, SEA

    Workflow and Informatics Specialist, SEA

    Beckman Coulter Diagnostics • Kuantan, Pahang, Malaysia
    Workflow and Informatics Specialist, SEA.At Beckman Coulter Diagnostics, a Danaher operating company, we innovate to improve patient health for over 90 years. Our diagnostic solutions are used world...Tunjukkan lagi
    Kemas kini terakhir: 3 jam yang lalu • Dinaikkan pangkat • Baharu!
    Multilingual LLM Data Evaluator (Bahasa Melayu and English)

    Multilingual LLM Data Evaluator (Bahasa Melayu and English)

    Chemin • Kuantan, Pahang, Malaysia
    Chemin is partnering with a well-established enterprise with an innovation arm focused on emerging technologies, infrastructure, and AI-related initiatives. For this project, we are seeking individu...Tunjukkan lagi
    Kemas kini terakhir: 5 jam yang lalu • Dinaikkan pangkat • Baharu!
    Freelance Luxury Brand Evaluator in Pahang, Malaysia

    Freelance Luxury Brand Evaluator in Pahang, Malaysia

    CXG group • Kuantan, Pahang, Malaysia
    Turn your passion for luxury into a career opportunity! Explore the world of premium brands and make a lasting impact in fashion, beauty, jewelry, or automobiles. Join CXG, the global leader in cust...Tunjukkan lagi
    Kemas kini terakhir: 5 jam yang lalu • Dinaikkan pangkat • Baharu!
    Senior Business Analyst - Insurance, Cantonese Speaker (Fully Remote)

    Senior Business Analyst - Insurance, Cantonese Speaker (Fully Remote)

    CoverGo Limited • Kuantan, Pahang, Malaysia
    Working on the latest tech for the Insurtech Market Leader.At CoverGo, our mission is to empower all insurance companies to make insurance 100% digital and accessible to everyone.We are a leading g...Tunjukkan lagi
    Kemas kini terakhir: 5 jam yang lalu • Dinaikkan pangkat • Baharu!
    B2B Content Strategist

    B2B Content Strategist

    Starfish • Kuantan, Pahang, Malaysia
    A leading Singaporean Telecoms Client is seeking an experienced.Full-time, Remote) with a strong background in content management, strategy, and copywriting. You will lead projects that enhance the ...Tunjukkan lagi
    Kemas kini terakhir: 30+ hari yang lalu • Dinaikkan pangkat
    Remote SEO & ASO Specialist

    Remote SEO & ASO Specialist

    Freelancing • Kuantan, Pahang, Malaysia
    We are looking for an experienced.Conduct keyword research, competitor analysis, and performance tracking.Collaborate with content, marketing, and product teams to enhance digital presence.Google a...Tunjukkan lagi
    Kemas kini terakhir: 5 jam yang lalu • Dinaikkan pangkat • Baharu!
    Software Test Engineer (Kuala Lumpur, Malaysia)

    Software Test Engineer (Kuala Lumpur, Malaysia)

    Weshine • Kuantan, Pahang, Malaysia
    Weshine is hiring for our client (ESGpedia Singapore Pte Ltd), based in Singapore, for a digital remote role.ESGpedia Singapore Pte Ltd is Asia's leading ESG FinTech, underpinning the Monetary Auth...Tunjukkan lagi
    Kemas kini terakhir: 5 hari yang lalu • Dinaikkan pangkat