At Boston Children's Hospital, success is measured in patients treated, parents comforted and teams taught. It's in discoveries made, processes perfected, and technology advanced. In major medical breakthroughs and small acts of kindness. And in colleagues who have your back and patients who have your heart. As a teaching hospital of Harvard Medical School, our reach is global and our impact is profound. Join our acclaimed Research Computing team and discover how your talents can change lives. Yours included. The Systems Administrator II will be responsible for: - Administering various aspects of the high-performance computing clusters including server racking and deployment, software installations, troubleshooting, upgrades, and maintenance
- Performing cluster configuration; software provisioning, configuration management, and application deployment code in high availability on-prem and cloud environments
- Coordinating and performing preventive system maintenance; Setting up day-to-day maintenance of production systems. Performing resource monitoring, performance tuning, and scaling of infrastructure as needed.
- Designing and standing up new AWS environments requiring compute, databases, and other AWS services. Developing standard Terraform scripts, Ansible playbooks, and continuous integration/deployment pipelines for all infrastructure.
- Maintaining Research Computing's GitLab server, including user and group administration, Docker repositories, and CI/CD pipeline, and Kube cluster.
- Developing strategies to maintain and enhance platform availability, utilization, security standards
- Researching, evaluating and promoting new technologies, monitoring and infrastructure as code tools across research computing.
In order to qualify you must have: - Knowledge of theories, principles, and concepts typically acquired through a Bachelor's Degree in Computer Science or a closely related field
- Minimum 3 years experience with RedHat/CentOS/Ubuntu Linux or greater System Administration
- Enterprise Technologies experience (NFS, LDAP, S3, LINUX clustering, performance tuning, mail, syslog, security)
- Proficiency in at least one scripting language, such as Bash, Perl, Python.
- Excellent troubleshooting and problem solving skills with limited supervision (network, hardware, OS, performance related problems).
- Excellent performance working both independently and collaboratively, often to tight deadlines.
- Excellent oral and written communication.
The following is preferred but not required: - Experience with high-performance computing schedulers such as SLURM, SunGrid Engine, LSF
- Experience with version control, infrastructure as code software development and containerized platform as service tools (Git, Ansible, Terraform, Docker, Kubernetes)
- Experience with cluster and security monitoring tools (Nagios, Splunk, Ganglia)
- Experience with at least one RDBMS solution (MySQL / PostgreSQL).
- Experience with common AWS services (ec2, S3, Lambda).
- Comfortable working in ambiguous and dynamic environments
Please note: During a public health emergency, individuals in this role may be expected to take on additional duties to respond to organizational needs. Boston Children's Hospital offers competitive compensation and unmatched benefits, including a, affordable health, vision and dental insurance, generous levels of time off, 403(b) Retirement Savings plan, Pension, Tuition Reimbursement, cell phone plan discounts and discounted rates on T-passes (50% off). Flexible schedule (if applicable). Discover your best. Boston Children's Hospital is an Equal Opportunity / Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to their race, color, religion, national origin, sex, sexual orientation, gender identity, protected veteran status or disability. |