Collaborate with ML engineers, data scientists, and product teams to gather requirements and translate ML models into production-ready software components.
Design and implement scalable software architectures, including:
Model inference microservices
Backend libraries and APIs for ML components
Feature serving and data pipelines for model inputs/outputs
Write production-grade code in C++, Python, and Java with strong emphasis on performance, maintainability, and modularity.
Develop custom C++ and CUDA kernels where necessary to accelerate model inference, preprocessing, or data transformations.