Devops Fundamentals

DevOps stands for Development & Operations.
It means shared responsibilities between the development and operations teams.
The development team work being aware of the challenges faced by operations and contribute and ops team work more like developers with proper flow and process.
DevOps is not a framework or a workflow.
It's a culture that is overtaking the business world.
DevOps ensures collaboration and communication between software engineers (Dev) and IT operations (Ops). With DevOps, changes make it to production faster, Resources are easier to share, And large-scale systems are easier to manage and maintain.

Basics

Info

Values:
- C (Culture) - How people interact
- A (Automation) - Automation at the heart of solution to the problem
- M (Measurement) - What to measure and incentivize accross the organization
- S (Sharing) - Feedback loops for discrete regular improvements based on transparency
Principles:
- Systems Thinking
  - Consider the outcome of the entire pipeline or value chain
  - For example adding app servers might overload the db with connections
  - In case of IT the process might be helpful for the sub org but making the delivery slow
  - Systems thinking must be used as guidance to set proper success criteria and evaluation of the system
- Amplify Feedback loops
  - Effective feedback is what drives any control loop designed to improve the process
  - Use amplify feeback loops to help when creating multi-team processes, visualizing metrics & designing delivery flows
- Continuous experimentation and learning
  - Focus on doing rather than talking about it
  - Team must be ready to learn new things and the best way is to try to see if it works
  - Use the approach to define team processes and standards, and as part of the leadership style
Playbook (set of methodologies):
- People over process over tools
  - Choose people, then define process and then choose the tools and not the other way around
- Lean management
  - Work in small batches
  - Work within progress limits
  - Feedback loops
  - Visualization
- Continuous delivery
  - Code and release code regularly and in small batches
- Visible Ops-style change control:
  - Eliminate fragile artifacts
  - Create repeatable build process
  - Manage dependencies
  - Create and env of continuous improvement
- Infrastructure as Code
  - System can and should be treated as code
  - This standardizes the infrastructure and reduces the effort
Practices:
- Incident management system
- Devs on call
- Public status pages
- Blameless postmortems
- Embedded teams
- The cloud
- Andon cord
- Dependency Injection
- Blue green deployment
- Chaos monkey
Tools must be:
- Programmable
- Verifyable
- Well behaved with the other parts of the system

DevOps: Culture problem

Info

When development and operations start working in silos problems begin to arise
Problem is both groups are incentivised differently
Both groups optimise their flows but it creates a less efficient overall system
This needs to be solved by a culture shift
Ways to do it:
- Communication (Blameless postmortems):
  - Have a postmortem within 24-48 hours of the outage
  - Build a timeline of the events
  - Analyse the issues and discuss possible solutions
  - Discuss how the customers were affected
  - Document the learnings
  - Discuss how can detection be done earlier in similar cases in the future
  - Optimize for failure and recover than just prevention
- Communication (Transparent uptime):
  - Admit failure
  - Have an open communication channel
  - Be authentic
  - Have a POC
- Collaboration:
  - Have a team that has people working on both dev and ops aspects
  - Practice openness:
    - Open chatrooms
    - Open wiki pages etc.
- Management best practices:
  - Cross functional teams
  - Help people through the change
  - Use Lean Agile processes
- Kaizen (Continuous Improvement):
  - Principles:
    - Good process bring good results
    - Go see for yourself
    - Speak with data manage by facts
    - Take action to correct and contain root causes
    - Work as a team
    - Its everyone's business
  - 5 Whys
    - Ask questions in repeated iterations
    - Do not accept time constraints as an answer find out what lead to the delay
    - Do not accept manual failure as answer find out what process failed

Building blocks of DevOps

Info

The main building blocks of DevOps are:
Agile
- DevOps is deep rooted in Agile
- Its highky suggested DevOps be implemented in conjunction with Agile as they are highly complimentary
- DevOps has roots in Agile and the process are iterative which generates quick product or solution delivery.
Lean
- Principles:
  - Eliminate waste
  - Amplify learning
  - Decide as late as possible
  - Decide as fast as possible
  - Empower the team
  - Build integrity
  - See the whole
- Techniques:
  - Kaizen
  - Value-Stream mapping
ITIL, ITSM and SDLC
- These are prescriptive models mostly predecessors of modern day DevOps

Infrastructure as Code

Info

Infrastrucure as Code is a complete programmatic approach to infrastructure management
It allows to manage infrastructure with the same principles as software development
With IaC we can code the scripts in an IDE, run tests, apply decision making based on state and deploy automatically
For example, we can completely describe an AWS system as code using a format called cloud formation which enables to replicate the system all the time
Configuration Management:
- Concepts:
  - Provisioning: Process of making a server ready for operation using hardware, OS, system drivers & network connectivity
  - Deployment: Automatically deploying and upgrading applications on a server
  - Orchestration: Co-ordinating operations within multiple systems
  - Configuration management: Overarching term for management of change control for system configuration after initial provision
- Techniques - how tools approach configuration management
  - Imperative / procedural: Commands necessary to produce a state and defined and executed
  - Functional / Declarative: We define the state and the tool converges the exisiting configuration based on the desired state
  - Idempotent: Repeat execution equals same exact model
  - Self-service: No need for manual intervention other than the requesting user
Common toolchain:
- For AWS: provisioning can be done via AWS cloud formation
- For Azure: Azure resource manager
- Terraform: Allows to provision in a more abstract way which can be translated to multiple platforms

Continuous Integration\Delivery

Info

Continuous Integration: Automatically building and unit testing the entire application at regular intervals ideally at each code check-in
Continuous Delivery: Automatically deploying every change to a production like environment and performing integration and acceptance testing
Continuous Deployment: After automated testing automatically deploying the change to production
Advantages:
- Decreased time to market
- Quality increase
- Go live is not an event
- Lead time for changes is reduced
- State of Devops: having short lived feature branches and having less than 3 overall branches improves efficiency
- Lower mean time to recover
CI practices:
- Short build times. Coffee test
- Commit really small bits
- Don't leave the build broken
- Trunk based development flow
- Don't allow flaky tests
- The build must return a status, log and artifact marked with the build number
CD practices:
- No separate artifact for different environments
- Artifacts should be immutable
- Staging should be a copy of production
- Stop deployes if a previous step fails - (Andon cord)
- Deployments should be idempotent

Reliability Engineering:

Info

Ability of a system or component to function for a specified period of time
MTTR: Mean time to recovery
MTBF: Mean time between failures
Reliability engineering typically involves embedding product knowledge into operations and operational knowledge into product
Design for Operation:
- Use design patterns (Gang of 4)
- Use reliability patterns like circuit breaker (Netflix, Hysterix library)
- 12 factor app
- The success of the overall app relies on using the right patterns at the very beginning of the toolchain
- Have operational intelligence with the development capabilities
- This will provide better code shipped which is resilient and performant
- Netflix's chaos monkey actively kills servers and developers need to factor in this when they create applications
Operate for design:
- Use a lean approach to monitoring and metrics
- Build a MVP, target a few systems, learn, repeat and go deeper as needed
- Build just enough metrics to gain insights and not overload the systems
- Areas for monitoring service uptime, application uptime, security, system usage, etc.
- Have a minimal centralized logging mechanism
- SRE toolchain:
  - Monitoring: Grafana, Containers (Prometheus)
  - Logging: Splunk, ELK stack (Elasticsearch, logstash, kibana)
  - Statuspage.io provides status pages as a service
  - Security - Checkmarx (FOSS scans)