Chemin is partnering with a well-established enterprise with an innovation arm focused on emerging technologies, infrastructure, and AI-related initiatives. For this project, we are seeking individuals as LLM Evaluators who can read and write Bahasa Malaysia (BM) and English fluently.
You will evaluate AI-generated outputs by reviewing the response accuracy and reasoning or logic, using the chat history strictly as context. This role requires strong language skills, judgment, attention to detail, and the ability to identify logical, grammatical, and structural issues without altering the original meaning of outputs widely.
The key to your success is to ensure a high standard of excellence in performing validations, as quality is critical to enhancing AI models. Ultimately, alongside the Delivery Team at Chemin, you will shape how Malaysian languages are represented in the next generation of powerful large language models by refining a high volume and quality dataset.
Requirements
- Deliver precise validations of response outputs with minimal to no revisions.
- Evaluate LLM prompts :
- Ensure the response is accurate and aligned with the conversational context.
- Verify that the reasoning supports the response logic.
- Rewrite responses or reasoning where required, depending on the severity of issues.
- Adapt text to sound natural in the service languages (Bahasa Malaysia and English), especially in cases where LLM outputs are affected by translation-related grammar errors.
- Maintain the original meaning of the response output while improving sentence structure, grammar, and clarity for minor errors.
- Discard data points with entirely illogical response outputs or reasoning, as these are unusable for AI training.
- Proactively communicate challenges, queries, or concerns with Project Managers to ensure smooth operations.
- Meet key deadlines while maintaining quality and consistency in work output.
Qualifications
Malaysian citizen fluent in English and Bahasa Melayu (reading and writing).Able to evaluate logic, language clarity, and reasoning structure.Capable of detecting unnatural language caused by machine translation and adapt it while maintaining meaning.Attention to detail and a high level of accuracy in detecting errors in spelling, grammar, and conversation flow.Efficient time management skills to meet and exceed daily deliverables.Professional attitude : responsive and inquisitive to ask clarifying questions.Possess a laptop or desktop computer with a stable internet connection.High attention to detail and ability to follow strict linguistic and structural guidelines.Comfortable working with LLM-generated content and chat-style datasets.Bonus : prior experience in LLM evaluation or validation and data annotation projects, especially in linguistics.Work Arrangements
Available for at least 5-8 hours per day, including on weekends when necessary. Flexible core work hours from 9 : 00 am - 6 : 00 pm.Able to commit to the project from November 19th to November 30th, 2025.Benefits
100% remote work opportunity.Work from the comfort of your own home.Be part of a forward-thinking AI enablement company in Malaysia.Gain invaluable work experience on an AI solution that represents Malaysian languages and dialects.Accelerate AI development with Chemin by training next-generation language models!