martedì 10 marzo 2015

Book for everyone working in IT

The first part sets the scene and offer no insights.
It remembers (rehearse)

  • The theory of Constraints
  • Lean production (Toyota Production System)
  • Total Quality Management

which all agree that WIP is a silent killer. So job and material release is the most critical mechanism in the management of a plant.
It's paramount to control the release of work into IT Operations.
The most constrained resources need to serve the goal of the entire system, not just one silo.

It Talks about the three ways:

  • the first way helps to understand how to create a fast flow of work from Development to IT Operations (because that's what's between the business and the customers)
  • the second way shows how to shorten and amplify feedback loops, so to fix quality at the source and avoid rework.
  • the third way shows how to create a culture to foster experimentation, learning from failure, understanding that repetition and practice are prerequisites to mastery.

There are 4 types of work
  • business projects
  • internal IT projects
  • changes
  • unplanned work

Unplanned work is destructive: it is not really work.

A kanban board helps to release work and control wip.
By halting all the projects but one (for example, all but Phoenix), we are reducing the amount of WIP, therefore improving the due-date performance. So a possible solution is for IT Operations to freeze all non-Phoenix work for 2 weeks. Development will freeze all deployments (while continuing to work on all the non-Phoenix projects).
Then, it is necessary to identify the area of technical debt, which Development can tackle to decrease the unplanned work being created by problematic applications.
Before lifting the freeze, we need to decide how to prioritise the work?
In the book, Brent is a constraint, but he is a worker, not a work centre.  Brent is a worker supporting too many work centres, that's why he is a constraint. The solution can be to standardise Brent's work so that other people can execute it. And through standardisation it is possible to enforce consistency and quality. And the documentation will enable to automate some of those tasks. Until you document the task, Brent will always be a constraint even with additional headcount.
To lift the freeze, the projects which are safe to release are the ones which don't require Brent, the constraint. So it's important to map how the work flows. Or more accurately, it is fundamental to build a bill of resources (list of required work centres and resources) for each project, so to get an handle on what the capacity of the department and demand is. So to be able to know if it is possible to accept new work and be able to schedule it.
The monitoring project elevate the constraint! It takes away work from Brent. That means eleveting preventing work, which is at the heart of Total Productive Maintenance. Improving daily work is more important than doing daily work.
The third way is about putting tension in the system, so to reinforce habits: resilience engineering tells that we should routinely inject faults into the system, frequently, to make them less painful.
It doesn't matter what you improve, as long as you're improving. Because otherwise, entropy guarantees that you are getting worse. Mike Rother calls this improvement Kata: practice and drills lead to mastery. Practicing 5 minutes daily is better than once a week for three hours. Culture of improvements comes if you create habits.
The Second Way is making wait time visible, so you know the waiting time the work spends in a queue.
Improvement Kata: plan-do-check-act
As much important as to throttle the release of work, is managing the handoffs. The wait time for a given resource is the percentage that resource is busy divided by the percentage that resource is idle. If a resource is 99% utilised than you have to wait 99 times as long as if that resource is fifty percent utilised. So managing work across departments is at least ten times more difficult than prioritising work inside a department, because of handoffs.
To prioritise the projects, three lists: one which requires the constraint, one which increases the constraint's throughput, and one which includes everything else. Identify the top projects in each list.
Sometimes IT is under-scoped and jeopardised the achievements of business goals, sometimes is over-scoped because it focuses on unnecessary tasks.
Some IT controls John held near and dear aren't needed because other parts of the organization are mitigating those risks. Wisest auditors says there are three internal control objectives: gain assurance for reliability of financial reporting, compliance with law and regulations, efficiency and effectiveness of operations.
First Way is about the entire organization achieving its goals, not just one part of it.
Measurements of company goals depend on IT most of the times.
IT objectives are a prerequisite to goals measurements.
You must leave the realm of IT to understand where the business relies on IT to achieve its goals.
If the kpi is on-time delivery, a new forward-looking kpi is the percentage of vehicles that have had their required oil change performed. IT needs preventive maintenance, like add fragile application to the replacement list.
Value chain of how IT jeopardises company goals.
Sales forecast accuracy is being jeopardised by poor grasp of understanding customer needs and wants. If you know what products are out of stock in the stores, it is possible to increase sales. The reports from the CRM about sales is important, to be able to A/B test the offers and replicate the winning ones.
Quick time to market and fail fast are needed in competitive environments. the longer the product development cycle, the longer the company capital is locked up and not giving a return. With smaller and more frequent releases you deliver cash back faster.
Repetition, especially where team work is required, creates trust and transparency.
The First Way is about controlling the flow of work from Development to IT Operation. It was improved by freezing and throttling project releases. But large batch size of deployments are causing unplanned recovery work downstream.
To master the Second Way means to create constant feedback loops from IT Operations back into Development, designing quality into the product at earlier stages, embedding knowledge where we need. Nine month releases don't allow this, you need faster feedback faster detection and recovery, and to prevent problems. This means stopping the production line when a build fail.
In any system of work, the theoretical ideal is a single-piece flow, which maximises throughput and minimises variance. You get there by minimising batch sizes. By increasing the number of features in each release you are doing the opposite and you lose the ability to control variance from one release to the next.
The Suzuki Hayabusa has no reverse gear. Upstream work means rework because of defects, lack of specification, .... So a change in production db not replicated in staging is something bad.
To reduce the takt time, which is the cycle time needed to keep up with customer demand, you need to observe all the tasks required to do a changeover, and then put in preparations steps and improvements that bring the changeover time down. At the factory they combined four work centres into one, doing two operations in a single machine, eliminating manual steps and automating the work cycle. So when defects were found, just one piece was wasted and the rest of the batch goes through a fixed work cycle. WIP is eliminated because no work centre overproduced anymore. One-step environment creation and deploy procedure is the goal.
Continuous delivery, version control for everything needed to build the environment, automate the environment creation process. Build process is the bottleneck. Deployment pipeline. Variance in the deployment process is bad.
Value stream map.
To analyse the sales data, data were copy from production db into open source db completely detached from the operational db, so performances were not impacted and changes could be made without putting projects at risk. Just, it is important to standardise what type of database are used because if every project spawn a new db on a whim, there would be a sprawl.
Cloud computing is possibly a form of outsourcing.
Evil chaos monkey: constantly trying to exploit security holes, storm the application with malformed packets, installing backdoors, ... This is the fastest mean to establish the Third Way, creating a culture that reinforces the values of taking risks and learning from failures and the need for repetition and practice to create mastery. Quality is improvement in the daily work.
IT is not a department, is pervasive like electricity. It's a skill. Any business managers leading a project or a team without having this skill, without knowing what technology can or can't do, they will fail.
Every COO worth their salt will come from IT.
The relationship between business and IT is a dysfunctional marriage: each of them feels powerless and hostage of the other. But if you embed IT into business operation or business, then you have no tension, no marriage, and maybe no IT department either.

IT is a critical competency and a predictor of company performance. IT is critical for customer acquisition.
To have a high level of reliability requires that changes be made frequently. 23k Releases a day. Those companies that achieve the goal of frequent deploys are two times more likely to exceed profitability....
All the applications should produce useful telemetry.
There should be an hypothesis-driven culture, doing nothing without measuring.
There is a conflict: protect sales commitment vs control manufacturing costs, increasing vs reducing inventory. This conflict is solved by Lean principles, reducing batch sizes, reducing WIP, shortening and amplifying the feedback loop. DevOps means to apply Lean principles to IT value stream. DevOps may push the Devs to take more responsibilities in coding deployments and monitoring.

Toyota Kata: two-week improvement cycle, improve anything. Managers become coaches and mentors, and PDCA is framed in a way such that people take small steps every day. The two-week improvement cycle pushes towards improvements, otherwise entropy causes a decline. 6 work centres became 4, only for entropy bringing them to 6 again.
Stop estimating work and base estimation on historical data.
Reduce lead time, instead of Devs and Testers optimising for themselves

Merging two operations in a single machine. This can be applied in a situation when you do regression on a bunch of commits all together before deploy or reject some functionalities all together. But if you test and release one functionality at a time, you have achieved a single-piece flow.

Nessun commento:

Posta un commento

Archivio blog