Research Scientist, Alignment & Safety

Duration : Permanent

Pay rate: $180k - $200k

Technical/Functional Skills:

  • Strong understanding of machine learning principles, especially in the context of LLMs.
  • Knowledgeable about different LLM approaches for aligning LLMs (SFT, RLHF, etc.).
  • Can own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out projects.
  • Fluent in at least one statistical programming language such as Python (preferred) or R.
  • Demonstrated background in collecting data from human participants (e.g., surveys, experiments) with knowledge about data quality, data validity, etc.
  • Strong verbal and written communication skills with the ability to work effectively across internal and external organizations and virtual teams.
  • Ph.D. or advanced degree in computer science, machine learning, cognitive science, psychology, economics, or similar (preferred).
  • Experience Required
  • Strong understanding of ML principles, especially in the context of LLMs. Fluent in at least one statistical programming language such as Python (preferred) or R.

Roles & Responsibilities:

  • Research how to best use techniques like Supervised Fine Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to align LLMs.
  • Work with technical and non-technical stakeholders to align LLMs for specific use cases.
  • Research approaches to ensure LLMs are safe, preventing the generation of harmful content and maintaining alignment with societal norms.
  • Research scalable oversight mechanisms that enable efficient monitoring and control of LLMs as they grow in size and complexity, ensuring consistent alignment with predefined objectives.
  • Demonstrated background in collecting data from human participants (e.g., surveys, experiments) with knowledge about data quality, data validity, etc.
  • Can own and pursue a research agenda, including choosing impactful research problems and autonomously carrying out projects.

Skills:

  • LLM
  • Python or R
  • Supervised Fine Tuning(SFT)
  • Reinforcement Learning with Human Feedback (RLHF)
  • psychology, economics, or similar (preferred)