Job description:
For our customer we are looking for a "Senior System Engineer" (m / w / d)
Project description
As part of this project, a modern observability platform will be built and developed. The Senior System Engineer takes over the design, implementation and operation of this platform with a focus on cloud-native technologies, infrastructure as code and devOps methods.
Framework parameters
* * Start: * * Quick
* * Duration: * * to 31.12.2026
* * full-time (100%)
* * Place of use: * * about 95% remote (exclusively from the German federal territory), occasional pre-location dates at the home-close customer location
* * Security requirement: * * readiness for security review (SÜ2), citizenship required according to state list
* * hourly rate: * * 88, - €all-in
* * Note: * * For this position there are some type-haves, which must be fully fulfilled in the profile and be clearly displayed.
Tasks Architecture and Concept
1. Creation and further development of architecture
- and operational concepts for the Observatory Platform, including the definition of target images, interfaces and integration points for existing or future services
Development and operation
1. structure and continuous development or European
- and operating environment, in particular with regard to:
2. CI / CD pipelines (e.g. GitLab and ArgoCD)
3. Release
- and deployment processes
4. Automated testing and monitoring
5. Versioning and reproducibility of build
6. Construction, operation and maintenance of the Observability Platform using Infrastructure as Code (IaC) using established tools such as Ansible, Terraform, Helmet and Cloud Natives technologies such as Kubernetes, Crossplane, Prometheus and Grafana
7. Automation of processes and processes within the Observatory Platform using Python, Go, Kubernet operators, REST-APIs and other cloud-native tools
8. Integration of observability components (e.g. Prometheus, Grafana, OpenTelemetry) or other contextual tools for the provision of metrics, logs and tracks
9. Ensuring the client's ability, scalability and resilience, taking into account operational requirements and customer-specific best practices
10. Self-employed development of solutions for the implementation of technical and technical use cases, in particular in the areas of monitoring, logging, aging, tracking and service health
11. Attention and implementation of customer targets for security
- and compliance requirements, in particular with regard to authentication, authorisation, encryption, data protection and access control
12. Documentation of the developed components, architectures and processes in the appropriate tools as well as structured knowledge transfer to the operational units
Quality control
1. Development and implementation of automated test procedures to ensure functionality, stability, client separation and security aspects of the Observatory Platform
2. Operation of load
- and performance tests to verify scalability, response time and system stability under realistic conditions
3. Establishment of monitoring
- and alerting mechanisms for the observability platform itself (meta-monitoring) to detect proactively bottlenecks or misconduct
4. Monitoring of technical KPIs and SLIs / SLOs that make the operational state and service quality of the platform measurable and documentation in Bewiki
5. Check compliance with customer requirements for architecture
- and safety guidelines, in particular with regard to automation processes, infrastructure changes and rollouts
6. Creation of review
- and test reports, including documentation of results, identified weaknesses and recommendations for action
7. Participation in the further development of the test
- and QA strategy in combination with the DevOp-, Security
- and complete teams
Experience & knowledge
1. * * At least 3 years of experience in ALL the following areas: * *
2. Terraform
3. Cubnets
4. GitOps (ArgoCD)
5. Grafana
6. Alerting Rules (Prometheus)
7. Creation of runbooks
8. Mimir, Loki and Tempo
9. Kubernet Operator
10. Grafana Automation (Grafana Operator)
11. Software development (API connections)
12. Tenant Lifecycle
Nice-to-Have:
1. RDBMS certification
2. Knowledge in SAFe and ITIL (at least 3 years)
3. Experience with / dev / zero as a service (at least 1 year)