Senior MLOps Engineer, GenAI Framework

Remote, USA Full-time
Job Description: • Architect and manage the continuous integration pipelines and release processes of our Generative AI framework and libraries related to Megatron-LM and NeMo Framework. • Design and implement efficient and scalable DevOps solutions to allow our fast growing team to release software more frequently while maintaining high-quality and maximum performance. • Work with industry standard tools (Kubernetes, Docker, Slurm, Ansible, GitLab, GitHub Actions, Jenkins, Artifactory, Jira) in hybrid on-premise and cloud environments. • Assist with cluster operations and system administration (managing: servers, team accounts, clusters). • Accelerate research and development cycles by automating recurring tasks such as accuracy and performance regression detection. • Developing new quality control measures, e.g. code analysis, backwards compatibility, and regression testing, while employing and advancing best-practices. • Work closely with DL frameworks and libraries (CUDA, cuDNN, cuBLAS, and PyTorch) teams and with other engineering teams within NVIDIA that provide software, testing, and release related infrastructure. Requirements: • BS or MS degree in Computer Science, Computer Architecture or related technical field (or equivalent experience) and 3+ years of industry experience in DevOps and infrastructure engineering. • Strong system level programming in languages like Python and shell scripting. • Extensive understanding of build/release systems, CI/CD and experience with solutions like Gitlab, Github, Jenkins etc. • Experience with Linux system administration. • Proficient with containerization and cluster management technologies like Docker and Kubernetes. • Experience in build tools, including Make, Cmake. • A strong background in source code management (SCM) solutions such as GitLab, GitHub, Perforce, etc. • Well-versed problem-solving and debugging skills. • Great teammate who can collaborate and influence others in a dynamic environment. • Excellent interpersonal and written communication skills. Benefits: • equity • benefits Apply tot his job
Apply Now

Similar Jobs

[Remote] Senior Software Engineer-MLOps

Remote, USA Full-time

Software Developer for Mobile / Web

Remote, USA Full-time

Lead Engineer, MLOps (London)

Remote, USA Full-time

Intern I - Mobile Application Developer

Remote, USA Full-time

Full Stack Mobile Developer

Remote, USA Full-time

Mobile Developer (iOS)

Remote, USA Full-time

Product Manager- Mobile App

Remote, USA Full-time

Remote Senior Product Manager – Mobile Experience, Customer Engagement & Analytics for Ring Smart Home App (Work‑From‑Home)

Remote, USA Full-time

Product Manager-Tech, Ring App

Remote, USA Full-time

Mobile Application Security Engineer

Remote, USA Full-time

Marketing & Design Specialist in Agriculture

Remote, USA Full-time

Sales Development Representative - South Texas (Healthcare Services)

Remote, USA Full-time

Experienced Data Entry and IT Security Specialist - Remote Job Opportunity at United Airlines

Remote, USA Full-time

Remote Tax Preparer (Entry-Level & Experienced) — 1099 Contractor

Remote, USA Full-time

**Experienced Live Chat Remote Data Entry Specialist – Delivering Exceptional Customer Service and Data Accuracy with blithequark**

Remote, USA Full-time

**Experienced Remote Data Entry Specialist – Healthcare Industry – $25/Hour**

Remote, USA Full-time

Experienced Part-Time Customer Support Representative – Delivering World-Class Service and Driving Business Growth at arenaflex

Remote, USA Full-time

**Experienced Entry-Level Remote Customer Service Agent – Travel and Consulting Industry**

Remote, USA Full-time

Experienced Online Customer Service Representative – Delivering Exceptional Support and Building Lasting Customer Relationships at blithequark

Remote, USA Full-time

Experienced Data Analyst for Remote Opportunities – Driving Business Decisions with Data-Driven Insights at arenaflex

Remote, USA Full-time
Back to Home