Evaluation Scenario Writer - AI Agent Testing SpecialistMindrift • Sungai Petani, Kedah, Malaysia

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • Sungai Petani, Kedah, Malaysia

2 hours ago

Job description

Mindrift is looking for a freelance Agent Scenarios Designer based in the specified country. The role focuses on designing realistic and structured evaluation scenarios for LLM‑based agents, testing agent outputs, and refining tests. You will work on a flexible schedule and receive pay up to $38 / hr based on experience.

What We Do

The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting‑edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real‑world expertise from across the globe.

About the Role

You will design realistic and structured evaluation scenarios, create test cases that simulate human‑performed tasks, and define gold‑standard behavior to compare agent actions against. Your work will ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse. You need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Responsibilities

Design structured test scenarios based on real‑world tasks
Define the golden path and acceptable agent behavior
Annotate task steps, expected outputs, and edge cases
Work with developers to test scenarios and improve clarity
Review agent outputs and adapt tests accordingly

How to Get Started

Apply to this posting, qualify, and you’ll have the chance to contribute to projects aligned with your skills on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

Bachelor’s and / or Master’s degree in Computer Science, Software Engineering, Data Science / Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / NLP, Information Systems or related fields

Background in QA, software testing, data analysis, or NLP annotation

Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)

Strong written communication skills in English

Comfortable with structured formats like JSON / YAML for scenario description

Can define expected agent behaviors (gold paths) and scoring logic

Basic experience with Python and JavaScript

Curious and open to working with AI‑generated content, agent logs, and prompt‑based behavior

Ready to learn new methods, able to switch between tasks and topics quickly, and sometimes work with challenging, complex guidelines

Fully remote freelance role – only requires a laptop, internet connection, available time, and enthusiasm to take on a challenge

Nice to Have

Experience in writing manual or automated test cases

Familiarity with LLM capabilities and typical failure modes

Understanding of scoring metrics (precision, recall, coverage, reward functions)

Benefits

Get paid for your expertise, with rates up to $38 / hr depending on your skills, experience, and project needs

Participate in a flexible, remote, freelance project that fits around your primary professional or academic commitments

Gain valuable experience to enhance your portfolio through an advanced AI project

Influence how future AI models understand and communicate in your field of expertise

#J-18808-Ljbffr

Create a job alert for this search

Evaluation Writer Ai • Sungai Petani, Kedah, Malaysia

Related jobs

Content Creator

DK UKTIMATE (M) SDN BHD • Sungai Petani, Kedah, Malaysia

Diploma in Marketing, Communications, or a related field (or equivalent experience).Proven experience creating content across different platforms (social media, blogs, websites, video, etc.Strong w...Show more

Last updated: 9 days ago • Promoted

AI Data Specialist - Chinese

RWS Group • Batu Kawan, Penang, Malaysia

RWS Group, Kuala Lumpur, Malaysia.We are looking for an AI Data Specialist to support the improvement of AI-generated content in English. Flexible, work whenever you want.Until the end of December 2...Show more

Last updated: 18 days ago • Promoted

Become a Luxury Watch Brand Evaluator in Penang, Malaysia

CXG • George Town, Penang, MY

Quick Apply

Turn your passion for fine timepieces into a rewarding freelance opportunity!.Explore the world of luxury watchmaking and make a lasting impact on some of the world’s most prestigious horology bran...Show more

Last updated: 30+ days ago

Senior Software Engineer, AI Model serving - Kuala Lumpur, Malaysia

Clutch Canada • Bayan Lepas, Penang, Malaysia

PLEASE APPLY THROUGH THIS LINK : .The mission of Speechify is to make sure that reading is never a barrier to learning.Over 50 million people use Speechify’s text-to-speech products to turn whatever ...Show more

Last updated: 30+ days ago • Promoted

Senior AI Research Engineer, Model Inference (100% Remote)

Tether Operations Limited • Malacca, 04, MY

Join Tether and Shape the Future of Digital Finance.At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from ex...Show more

Last updated: 30+ days ago

Coding Specialist (Fluent in Swedish) - AI Trainer

Invisible Expert Marketplace • Alor Setar, Kedah, Malaysia

Coding Specialist (Fluent in Swedish) - AI Trainer.Are you a coding expert fluent in Swedish eager to shape the future of AI? Large‑scale language models are evolving from clever chatbots into powe...Show more

Last updated: 16 days ago • Promoted

Workflow and Informatics Specialist, SEA

Danaher • Central George Town, Penang, Malaysia

Are you ready to accelerate your potential and make a real difference within life sciences, diagnostics and biotechnology?. Beckman Coulter Diagnostics, one of Danaher’s 15+ operating companies, our...Show more

Last updated: 30+ days ago • Promoted

Nahuatl Language Expert - AI Trainer

Invisible Expert Marketplace • Bayan Lepas, Penang, Malaysia

Nahuatl Language Expert – AI Trainer.We’re looking for Nahuatl language specialists who live and breathe regional dialects, phonetics, syntax, idiomatic expressions, oral traditions, and cultural r...Show more

Last updated: 29 days ago • Promoted

Freelance Automotive / Mechanical Engineering - QA / AI Trainer

Mindrift • Sungai Petani, Kedah, Malaysia

Freelance Automotive / Mechanical Engineering - QA / AI Trainer.At Mindrift, innovation meets opportunity.We believe in using the power of collective intelligence to ethically shape the future of AI....Show more

Last updated: 15 days ago • Promoted

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • Alor Setar, Kedah, Malaysia

Mindrift is looking for a freelance.The role focuses on designing realistic and structured evaluation scenarios for LLM‑based agents, testing agent outputs, and refining tests.You will work on a fl...Show more

Last updated: 2 hours ago • Promoted • New!

AI Evaluator - Cantonese (Chinese) - Malaysia

CrowdGen by Appen • George Town, Penang, Malaysia

Join CrowdGen as we launch an exciting new AI Voice Interaction Project designed to help improve the way voice assistants understand and respond to users! We’re looking for detail-oriented contribu...Show more

Last updated: 2 hours ago • Promoted • New!

Senior Research Engineer Multimodal & Video Foundation Model (100% Remote)

Tether Operations Limited • Malacca, 04, MY

Last updated: 30+ days ago

Freelance Medical Content Writer

Digital • George Town, Penang, Malaysia

Hire Digital is seeking a remote.Freelance Medical Content Writer.The ideal candidate has a solid background in medical or pharmaceutical writing and can translate complex topics into clear and eng...Show more

Last updated: 2 hours ago • Promoted • New!

Research Analyst (Chinese)

The Search Group • Penang, Penang, MY

Quick Apply

The Search Group is a Risk and Research consultancy firm with a global footprint, and an enhanced presence in both Europe and East Asia. Our global clients rely upon us for due diligence, business i...Show more

Last updated: 30+ days ago

Business Intelligence Analyst

Arrow Components • PenangMalaysia, Penang, Malaysia

Overall 10 years of experience in Industry including 6+Years of experience as developer using Databricks / Spark Ecosystems. Hands on experience on Unified Data Analytics with Databricks, Databricks W...Show more

Last updated: 30+ days ago • Promoted

Freelance Data Annotator with Japanese - AI Trainer

Toloka Annotators • Kulim, Kedah, Malaysia

Freelance Data Annotator (Japanese) – AI Trainer.Get AI-powered advice on this job and more exclusive features.This opportunity is only for candidates currently residing in the specified country.Yo...Show more

Last updated: 2 hours ago • Promoted • New!

Associate Analyst, Business Planning (Ops)

Analog Devices • PenangMalaysia, Penang, Malaysia

Come join Analog Devices (ADI) – a place where Innovation meets Impact.For more than 55 years, Analog Devices has been inventing new breakthrough technologies that transform lives.At ADI you will w...Show more

Last updated: 30+ days ago • Promoted

Analyst Programmer

Eurofins GSC IT Malaysia • Seberang Perai, Penang, Malaysia

Eurofins Scientific is an international life sciences company, which provides a unique range of analytical testing services to clients across multiple industries. The Group believes it is the world ...Show more

Last updated: 30+ days ago • Promoted