Overview
We are seeking an AI Model Evaluation Engineer (Junior–Mid Level) with strong QA automation expertise and hands-on experience in evaluating OCR, LLM (chatbot) models, and RAG dataset preparation (speech-to-text, text-to-speech, video-to-OCR). The role focuses on automated testing, ground truth creation, and workflow validation to ensure accuracy, compliance, and real-world reliability for production-ready AI systems.
Responsibilities
- Evaluate LLM (chatbot), OCR, and RAG datasets for correctness, bias, compliance, and real-world robustness.
- Design and automate test frameworks for RAG pipelines, workflow triggers, and chatbot responses.
- Craft and validate ground-truth datasets for OCR, TTS, and speech-to-text projects.
- Test chatbot responses for accuracy, context relevance, ethical compliance, and edge cases.
- Conduct load testing to ensure system performance under high-traffic and stress scenarios.
- Integrate open-source LLM evaluation frameworks (e.g., DeepEval, HuggingFace evaluation tools) into testing pipelines.
- Automate data processing & reporting workflows using Google Apps Script and Google Sheets for faster insights.
- Document results, define acceptance criteria, and collaborate with ML engineers, data scientists, and QA teams to enhance model reliability.
- Support CI / CD pipeline integration for model evaluation and regression testing.
Qualifications
Education
Bachelor’s degree in Computer Science, AI / ML, Software Engineering, Data Science, or a related field.Master’s degree is a plus but not mandatory.Experience
1–3 years in QA automation, model evaluation, or NLP / ML testing roles.Experience in open-source LLM model testing (e.g., DeepEval, RAG testing frameworks).Hands-on experience in crafting ground-truth datasets for OCR, speech-to-text, or TTS projects.Exposure to AI chatbot evaluation for bias, fairness, and compliance.Technical Skills
Programming & Automation : Python, Google Apps Script.QA Automation : PyTest, Selenium, or similar frameworks.Seniority level
Entry levelEmployment type
Full-timeJob function
Engineering and Information TechnologyIndustries : IT Services and IT ConsultingWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr