Senior ML Systems Engineer

SGS_JOB_2874

Engineering
Remote
Python
PyTorch
Distributed ML Training
FSDP / DDP
PyTorch DataLoader
OSS

Contract - 12 Months

ML Systems Engineer III We are seeking a strong ML Systems Engineer to join our Fundamental AI Research team; an organization focused on making research breakthroughs in AI. Responsibilities include developing deep learning libraries that support large-scale distributed training, open sourcing high quality code and reproducible results for the community, and bringing the latest research to client's products for connecting billions of users. The chosen candidate will work with a diverse and highly interdisciplinary team of scientists, engineers, and cross-functional partners, and will have access to cutting edge technology, resources, and research facilities. Years of Experience: 5+ years Degrees Required: Bachelors degree in Computer Science, Computer Engineering or relevant technical field How will performance be measured? - Completion and quality of engineering tasks. Candidate Value Proposition: This is working on cutting-edge Machine Learning training and inference code to create State of the Art research models. It’s working with leading researchers in the field. Cutting edge distributed training for creating state of the art ML models. Candidate Disqualifiers: Pure software engineers will not be a good fit. Experience with Large scale Model training with Pytorch is essential. Difficult Aspects of Job: Strong technical and communication skills will be needed to succeed in a fast-paced and ambiguous environment. Interview Process: 1-2 rounds (1 hour) We’ll setup a pool of interviewers to process the queue quickly. Mostly technical: experience with distributed training. How DDP/FSDP works, what are different parallelism techniques to scale models, what are their tradeoffs, which one would you use in which case, some back of the envelope calculation of memory/throughput requirements, so on.

Job Responsibilities:

  • Engineer, design, implement, and improve highly scalable machine learning systems and tools for enabling research
  • Apply knowledge of relevant research domains, along with expert coding skills, to platform and framework development projects
  • Write clean and robust machine learning code
  • Create documentation and support users in ramping and getting productive with the libraries and tools built by the team.

Skills:

  • 5+ years of Python experience
  • 2+ years of Pytorch experience
  • 0-2 years of Distributed ML Training (FSDP/DDP) experience
  • 5+ years of Dataset / Pytorch DataLoader experience
  • 3+ years of OSS experience
  • Demonstrated software engineering experience via work experience, or widely used contributions in open source repositories (e.g. GitHub)
  • Prior contributions to open-source AI/ML projects

Education/Experience:

  • 2+ years of experience with deep learning
  • 2+ years of experience developing machine learning algorithms in Python or C/C++
  • Experience with machine learning frameworks such as PyTorch or Tensorflow
  • Experience working with large datasets and data pipelines
  • Solid understanding of algorithms, data structures, and software engineering best practices.
  • Demonstrated ability to work collaboratively in a fast-paced, team-oriented environment.
  • Excellent problem-solving and communication skills.

Related Jobs

Firmware Software Engineer IV

Engineering
 Washington
12 Months

Location (mandatory): Redmond, WA The research team at client is looking for an experienced Embedded Software Engineer to develop firmware for a custom SoC. Years of Experience: 08 or more years.

C++
EMBEDDED C
MCU
RTOS

Firmware Engineer

Engineering
 Washington
12 Months

Location (mandatory): Redmond, WA We are seeing an Embedded Software Engineer to develop firmware and tools for a variety of AR and VR related devices.

STM32
FREERTOS
C++
Python
MCU

ML Engineer – Distributed Training (FSDP/DDP)

Engineering
Remote
12 Months

Client is seeking a strong Python/ML Systems Engineer to join our team, an organization focused on making research breakthroughs in AI. Responsibilities include developing and maintaining deep learning libraries that support large-scale distributed training, open sourcing high quality code, creating documentation, supporting the community to onboard and effectively leverage the solutions built by the team, and bringing the latest research to products for connecting billions of users. The chosen candidate will work with a diverse and highly interdisciplinary team of scientists, engineers, and cross-functional partners, and will have access to cutting edge technology, resources, and research facilities.

Python
Pytorch
C++
Deep Learning
Model Training
Fine Tuning

Android Engineer

Engineering
Remote
7 Months

Location (mandatory): Remote, USA The main function of a software engineer is to apply the principles of computer science and mathematical analysis to the design, development, testing, and evaluation of the software and systems that make computers work. A typical software engineer researches, designs, develops and tests operating systems-level software, compilers, and network distribution software for medical, industrial, military, communications, aerospace, business, scientific and general computing applications. Years of Experience: 4+ Years

Android Development
Jetpack Compose
UI
IOS

Senior Hardware Engineer – New Technology Investigation

Engineering
 California
80-90/Hr. on W2
12 Months

Location (mandatory): Sunnyvale, CA The main function of a hardware prototype engineer is to research, design, develop, test high-density, wearable electronics. The candidate will work with emerging technologies in a fast-paced team.

Electrical Engineering
Rigid Flex Boards
USB
MIPI
PCIE
SPI
I2C

Supply Chain Planner

Engineering
 California
58-60
3 Months

Location (mandatory): Sunnyvale, CA The main function of a Supply Chain Analyst is to define, build, manage and measure global asset management. A typical Supply Chain Analyst may be responsible for buying goods and services and analyze performance of suppliers.

Inventory Management
E2Open
Oracle Fusion

Production Planner

Engineering
 Ohio
6 Months

Location (mandatory): West Chester, OH The planner will be assigned to a Production/ Plant Supervisor and will also lead 2 Production Meetings a week. Experience working within an MRP/ERP system.

32-34/Excel
SAP
Production Planning
Materials Management

OS Developer - AOSP

Engineering
 California
90-100/Hr. on W2
12 Months

Location (mandatory): Redmond, WA or Sunnyvale, CA We are looking for OS developers with strong design and build skills, experience in multiple levels of the OS stack from drivers to frameworks and experience building embedded devices. A successful candidate in this role is self-driven, creative and doesn’t mind delving into different areas of the stack. This person will take initiative and should be willing to execute consistently in an agile, fast-paced environment.

AOSP
C++
JAVA
LINUX
EMBEDDED SYSTEMS

Safety Engineer

Engineering
 Ohio
12 Months

Location (mandatory): Oxford, OH Leads and promotes a SAFE First Culture Assures the facility is in compliance with safety and environmental programs, regulations, and company policy. Performing job hazard analysis to identify loss potential of our systems and processes and recommending appropriate corrective actions. Review location standards, policies, and practices as necessary to assure they are current and in concert with company and / or regulatory requirements. Work with Safety Manager to revise as necessary.

EHS
Safety
Training
logo

At SGS Consulting, we go beyond resume-job matches, creating meaningful connections and pathways for individuals to thrive in defining careers.


2025. All right reserved.
logologologologo