ML Engineer
Contract: 12 Months+
Request Highlights:
- Team: AI and Systems Co-Design, Accelerator Enablement
- Key projects: Large language model (LLM) inference performance improvements evaluations and performance benchmarking
- MTIA inference accelerator dev efficiency
- Purpose of team: Develop highly efficient training and inference systems customized to Meta AI workloads at scale
- Reason for request: Dev efficiency improvements
- Candidate will get to work on state-of-the-art large ML infrastructure that empowers production large language models and content recommendation models deployed
Job Responsibilities:
- Develop highly scalable GPU machine learning training system and custom inference hardware solution for a variety of AI workloads
- Implement and evaluate state-of-the-art performance optimization for large scale training and inference systems
- Code deliverables with engineering team with potential opportunity for external publication
Requirements / you will develop:
- Great communication working with team to understand business context and project roadmap
- Good and concise end-to-end programming skill across ML platform stacks such as Caffe2, Pytorch
- Across the board understanding of distributed training algorithms, memory/compute efficiency optimization, and how they affect high-level product metrics
Must-Haves / Non-Negotiable Skills:
- Python, C++ programming fluency
- Experience with Pytorch programming
- Experience with system performance analysis
Good-to-Haves:
- Strong communication and collaboration skills
- Experience with GPU programming and performance optimization
- Experience with large-scale system development and performance characterization/optimization