CoE - Digital Engineering - ML OPS Engineer
Date:
6 Apr 2026
Company:
Qualitest Group
Country/Region:
IN
Key Responsibilities
1. SageMaker Pipelines & Model Monitoring
- Build and maintain SageMaker pipelines for embedding generation and NER workflows.
- Extend pipelines for new workloads such as query reranking.
- Implement end-to-end pipelines across Dev, QA, and Prod.
- Develop custom monitoring (drift detection, latency, failure alerts) using SageMaker + AWS Lambda.
- Enable A/B testing and production rollout of new models.
2. GPU/CPU Performance & Cost Optimization
- Deploy and optimize GPU instances for high-throughput inference workloads.
- Benchmark and select optimal instance types for:
- Reranking models
- Embedding pipelines
- Vision inference systems
- Implement Spot Instance strategies for large-scale workloads.
- Optimize:
- Batch sizes
- Memory allocation
- Concurrency
- Manage inference services using Gunicorn, Boto3, and SageMaker Endpoints.
3. Search Infrastructure (Elasticsearch/Lucene)
- Optimize hybrid search (BM25 + vector search).
- Tune:
- Index configurations
- Sharding strategies
- Query performance
- Collaborate on ANN/HNSW tuning (ef_construction, M).
- Balance recall, latency, and memory usage at scale.
4. Video & Image Pipeline Infrastructure
- Deploy and scale pipelines for:
- Video shot detection (TransnetV2)
- Image embedding generation
- Containerize and autoscale workloads.
- Handle large-scale I/O efficiently.
- Monitor latency, failures, and drift across media pipelines.
5. ML Deployment & Platform Engineering
- Define standards for ML deployment and CI/CD pipelines.
- Build deployment workflows across Dev, QA, and Prod.
- Implement automated health checks and alerting using AWS Lambda.
- Ensure consistent, scalable deployment practices.
6. Production Deployment & Cloud Operations
- Lead deployment across:
- AWS EC2
- AWS Lambda
- SageMaker Endpoints
- Select optimal deployment strategies based on:
- Latency
- Throughput
- Cost
Required Qualifications
- 5+ years as an ML Engineer or MLOps Engineer in production environments
- Strong experience deploying PyTorch and TensorFlow models
- Hands-on expertise with:
- AWS SageMaker (pipelines, endpoints, monitoring)
- AWS Lambda
- EC2 and cloud infrastructure
- Proven experience with:
- GPU/CPU optimization and benchmarking
- Memory management and batch tuning
- Spot Instance cost optimization
- Experience with Elasticsearch/Lucene for large-scale search systems
- Expertise in containerized deployments and autoscaling
- Strong understanding of feature engineering for BERT-based models
- Experience with model monitoring, evaluation, and A/B testing
Preferred Skills
- CNNs, diffusion models, and deep learning architectures
- Ranking systems (cross-encoder / bi-encoder)
- Approximate Nearest Neighbors (HNSW)
- Clustering (K-Means, DBSCAN)
- Regression, decision trees, Bayesian methods
- Experience with multimodal ML systems