Projects | Shashi Pal — DevOps & Platform Engineering Portfolio

Major Enterprise Projects

🚇 Maha Metro Nagpur & Pune

Executed end-to-end data center and platform implementation from greenfield setup to production go-live for mission-critical metro operations. Delivered compute, virtualisation, network, firewall, backup, SAP administration, and 5D BIM-enabled project workload hosting with full environment hardening, validation, cutover, and operational handover.

Implementation covered core infrastructure build in Navi Mumbai data center, server/storage integration, OS provisioning, SAP on SUSE Linux deployment, Windows and XenServer workloads, enterprise backup policy rollout, network segmentation, firewall policy management, and post go-live support for production stability.

Led deployment sequencing across infra layers: rack/stack readiness, VLAN and switching plan on HP ProCurve, Fortigate firewall rule design and change control, RIB platform integration checkpoints, service acceptance testing, and final production release with rollback-ready cutover plans.

⚡ Impact: First project implementation in this portfolio with 5D BIM platform enablement, delivered with full-stack infrastructure, integrated security, networking, backup, and operations readiness

VMwareSAP on SUSE LinuxWindows ServerXenServer 5D BIM PlatformRIBSAP Administration Fortigate Firewall ManagementHP ProCurve Switches HPE ProLiant C7000HPE StorageHPE Data ProtectorHPE Tape LibraryHPE OpenView

VMware + XenServer: Architected virtualisation clusters, host profiles, and VM provisioning standards to support production-grade compute segmentation and capacity scaling.
5D BIM Platform + RIB: Enabled application-ready infrastructure for 5D BIM workflows, integrated RIB dependencies, and validated data flow readiness before production cutover.
SAP Administration + SAP on SUSE Linux + Windows Server: Administered SAP landscape services, performed OS hardening, patch baselines, and workload tuning for stable enterprise operations.
Fortigate Firewall Management + HP ProCurve Switches: Implemented VLAN topology, routing policies, ACL/firewall rule lifecycle, and secure traffic segmentation across application zones.
HPE ProLiant C7000 + HPE Storage: Delivered chassis, blade, and storage integration with redundancy design, failover readiness, and performance validation benchmarks.
HPE Data Protector + HPE Tape Library: Established backup/restore architecture, archival policies, and DR-aligned retention controls mapped to RPO/RTO objectives.
HPE OpenView: Configured infrastructure observability, alert baselines, and operational dashboards for post go-live production support.
Business Outcome: Delivered first-of-its-kind 5D BIM-enabled metro platform with production-ready availability, security controls, and deployment speed for mission-critical operations.

Data Center Specialist

⚖️ Delhi High Court Infrastructure Modernisation

Led infrastructure modernisation on Supermicro and RHEL-based environments, supported the database migration track from MSSQL to PostgreSQL 16, and deployed a Prometheus-Grafana monitoring stack for stronger operational visibility and control.

Owned deployment execution for the platform stack: environment preparation, PostgreSQL 16 infrastructure readiness, monitoring baseline rollout, migration-support checkpoints, and automated backup controls for production stability.

⚡ Impact: Enabled PostgreSQL-ready platform, supported migration activities, and operationalised Prometheus-Grafana monitoring with automated backups for reliable production operations

Supermicro ServersRHELPostgreSQL 16Migration SupportEnterprise StoragePrometheusGrafanaAutomated Backups

Supermicro Servers: Prepared hardware topology and performance validation to host database and application workloads under sustained production load.
RHEL: Standardised OS hardening, package baselines, service configuration, and patch governance for secure production consistency.
PostgreSQL 16 + Migration Support: Helped migration teams with environment readiness, compatibility checks, cutover support tasks, and post-switch stability validation.
Enterprise Storage + Automated Backups: Set up storage-aligned backup automation, retention policies, and recovery validation for production safety and audit readiness.
Prometheus Monitoring Stack: Deployed exporters, scrape targets, recording rules, and alert policies to monitor infrastructure and database health in real time.
Grafana Monitoring Stack: Designed operational dashboards with threshold-based visibility for database latency, host saturation, and incident response workflows.
Business Outcome: Improved platform reliability with migration support, proactive monitoring, and backup automation that reduced operational risk during transition.

Data Center Modernisation

Platform & Automation

🤖 DevOps Job Automation Platform

Workflow-driven automation platform for repetitive operations, reducing manual intervention and lead time by 70%. Implemented deployment-safe job orchestration with approval gates, execution logs, and Slack-integrated operational notifications.

⚡ Impact: 70% reduction in manual operational tasks · 4h saved per engineer per week

PythonGitHub ActionsAWS LambdaSlack APIDynamoDBSonarQubeSnyk

Python: Built core automation logic, workflow orchestration, and reusable operational utilities.
GitHub Actions: Implemented CI/CD-triggered execution pipelines with approvals and auditable run history.
AWS Lambda + DynamoDB: Delivered serverless execution and durable state tracking for low-maintenance automation runs.
Slack API: Integrated real-time notifications and response hooks for operations visibility and faster team collaboration.
SonarQube + Snyk: Applied SAST code quality gates and dependency vulnerability checks to all automation code before deployment.
Business Outcome: Reduced manual workload and improved delivery speed through reliable, auditable, and secure automation at scale.

Automation

☁️ Multi-Cloud Terraform Framework

Standardised provisioning modules for AWS and Azure with environment parity, governance controls, drift detection, and automated compliance checks via Open Policy Agent. Implemented versioned module release pipelines and environment promotion workflow from dev to production.

⚡ Impact: 80% faster environment provisioning · Zero manual drift incidents

TerraformAWSAzureGitHub ActionsOPACheckovHashiCorp Vault

Terraform: Created modular infrastructure code with environment parity and repeatable provisioning workflows.
AWS + Azure: Implemented multi-cloud landing patterns and policy-aligned baseline resources for scalable operations.
GitHub Actions: Automated plan/apply promotion pipelines from development to production with review checkpoints.
OPA: Enforced policy-as-code guardrails to prevent non-compliant infrastructure changes before deployment.
Checkov: Ran 500+ IaC security checks against every Terraform plan to surface misconfigurations before provisioning.
HashiCorp Vault: Injected dynamic cloud credentials into pipelines, eliminating long-lived static access keys.
Business Outcome: Accelerated environment delivery while improving governance, security posture, and deployment consistency across cloud platforms.

IaC

🛡️ DevSecOps Pipeline & Shift-Left Security

Designed and implemented a company-wide DevSecOps pipeline strategy embedding security controls at every stage of the software delivery lifecycle. Integrated SonarQube for static application security testing (SAST), Trivy for container and filesystem vulnerability scanning, Snyk for open-source dependency analysis, Checkov for IaC policy enforcement, and OWASP ZAP for dynamic application security testing (DAST) in staging environments.

Centralised secrets management using HashiCorp Vault with dynamic secret leases, removing all hardcoded credentials from code and CI/CD configurations. Implemented Cosign and Sigstore-based container image signing to enforce supply-chain integrity and prevent tampered image deployments in Kubernetes clusters.

⚡ Impact: 100% pipeline security gate coverage · Zero hardcoded credential incidents · 80% reduction in unpatched CVEs in production images

SonarQubeTrivySnykCheckov OWASP ZAPHashiCorp VaultCosignSigstore GitHub ActionsArgoCDKubernetes

SonarQube: Enforced SAST quality gates blocking merges on critical code vulnerabilities, code smells, and coverage thresholds in every pull request.
Trivy: Scanned container images, Kubernetes manifests, Helm charts, and filesystems for CVEs and misconfigurations as part of the CI build stage.
Snyk: Continuously monitored open-source dependencies for known vulnerabilities with auto-remediation PR suggestions and licence compliance checks.
Checkov: Applied 500+ built-in IaC security checks across Terraform and Kubernetes manifests to catch misconfigurations before infrastructure provisioning.
OWASP ZAP: Automated DAST scans against staging environments to detect injection, authentication, and API security risks at runtime.
HashiCorp Vault: Implemented dynamic secrets, token lease management, and Kubernetes auth for zero-static-credential pipelines.
Cosign + Sigstore: Signed and verified container images cryptographically to prevent supply-chain attacks and enforce image provenance policies.
Business Outcome: Eliminated credential sprawl, reduced unpatched CVEs by 80%, and gave security teams full visibility into every release artefact.

DevSecOps

🔄 GitOps Platform with ArgoCD

Deployed GitOps workflows across 15+ services, enabling self-service deployment with full audit trail, canary rollouts, and automatic rollback on SLO violation. Designed progressive deployment policies and operational guardrails for controlled production changes.

⚡ Impact: 5× deployment frequency increase · Near-zero failed deployments

ArgoCDKubernetesHelmGitHubKustomizeTrivyCosign

ArgoCD: Established declarative GitOps deployment control with health checks and sync governance.
Kubernetes: Operated multi-service runtime environments with scalable and resilient release behavior.
Helm + Kustomize: Standardised packaging and environment overlays for controlled configuration promotion.
GitHub: Used pull-request driven change management and auditable release history for production readiness.
Trivy + Cosign: Enforced container scanning and image signing gates in the GitOps sync pipeline to prevent vulnerable artefacts reaching production.
Business Outcome: Increased release velocity with safer production deployments, lower failure rates, and signed image provenance.

GitOps

Reliability & Data

🗄️ PostgreSQL High Availability Cluster

Patroni-based HA architecture with automatic failover under 30 seconds, connection pooling via PgBouncer, and operational runbooks reducing on-call response time. Implemented staged deployment, failover drills, and production readiness validation before go-live.

⚡ Impact: 99.99% database uptime · Failover in <30 seconds

PostgreSQLPatroniPgBouncerAnsibleetcd

PostgreSQL + Patroni: Implemented resilient clustered database architecture with automated failover management.
PgBouncer: Added connection pooling to improve throughput and protect database stability under load.
Ansible: Automated node provisioning, configuration consistency, and repeatable deployment operations.
etcd: Provided distributed consensus backing for leader election and HA control-plane state.
Business Outcome: Improved uptime and service continuity through faster failover and more stable database operations.

High Availability

📊 Observability Stack Modernisation

Unified metrics, logging, and alerting delivering 5-minute MTTR targets for P1 incidents through improved detection, context-rich dashboards, and automated runbook triggers. Standardised deployment templates for exporters, alerts, dashboards, and service-level monitoring.

⚡ Impact: MTTR reduced from 45 min to 5 min for P1 incidents

PrometheusGrafanaLokiAlertmanagerPagerDuty

Prometheus: Collected service and infrastructure metrics with SLI/SLO-aligned monitoring coverage.
Grafana: Built actionable dashboards for engineering, operations, and stakeholder reporting visibility.
Loki: Centralised log aggregation and correlation to speed up production troubleshooting workflows.
Alertmanager + PagerDuty: Implemented structured alert routing, escalation policies, and on-call incident response.
Business Outcome: Reduced incident resolution time and strengthened operational confidence with proactive observability.

Observability

💰 Kubernetes Cost Optimisation

Reduced cloud spend by 35% through right-sizing, node autoscaling tuning with Karpenter, workload scheduling improvements, and spot instance strategies. Rolled out optimisation policies in phased deployments with workload safety checks and performance baselines.

⚡ Impact: 35% monthly cloud cost reduction · $200k+ annualised savings

KubernetesKarpenterAWSKubecostPython

Kubernetes: Analysed workload placement and right-sized requests/limits to cut waste safely.
Karpenter: Tuned autoscaling behavior and node lifecycle policies for cost-efficient capacity management.
AWS: Optimised compute purchasing mix and infrastructure choices for sustained monthly savings.
Kubecost + Python: Built cost visibility analytics and automation workflows to enforce optimization actions.
Business Outcome: Lowered recurring cloud costs while maintaining performance and deployment reliability.

Cost

Open Source & Public Work

Explore implementation details and live updates from public repositories on GitHub.

Browse All Repos GitHub Profile Discuss a Project

🗂️ Featured Projects — Built for Scale and Reliability