Three related problems: developers blocked, developers thrashing, and developers testing in Production.
Problem 1: Developers Blocked
Developers are prevented from working 20-25% of each day by Pre-Prod environment outages. Staff is demoralized. Productivity below norms and dropping.
Problem 1: Diagnosis
The Pre-Prod environments are inadequate: insufficient processing power, missing apps and services, and they're down for sometimes days at a time. When they are up and running, complex data setups were frequently ruined because other developers used or configured the same tables and records. Builds occurred at any time, and bogged down or took down the environments. Environment outages were not communicated, and fixes were days in coming. Some development groups need the latest build while others need a stable build of each component.
Problem 1: Solution
Environments were split into pairs so each project has its choice of the latest build or the last stable build. Scheduled builds were introduced, at noon and 6:00 pm to coincide with typical meal breaks. Accountability was established for check-ins causing broken builds. A support team was created to own and maintain all Pre-Prod environments, with coordinated Incident Management. Implemented dashboard with environment information such as availability and scheduled maintenance. Mapped application dependencies.
Problem 1: Results
Increased Pre-Prod uptime to 99.9%. Testing focused on impacted systems rather than diluted across all applications.
"Prior to (these improvements), system outages were the #1 productivity issue in App Tech. Now it's a non-issue." – App Tech Development Director
Problem 2: Developers Thrashing
Developers thrashing when assigned to multiple projects at 100% allocation, and simultaneously pulled back to finish projects which were declared complete but far from it. Refunds to customers for broken functionality and missed dates exceeding $20M annually. Customer frustration growing and goodwill dropping.
Problem 2: Diagnosis
Insufficient time allocated to complete a project before developers assigned to a new one. Leadership competing over the best developers would each assign a developer 100% to simultaneous projects. VP's compensated based on when projects were deployed into Production rather than when they were completed, so unfulfilled requirements were logged as bugs and fixed post Go-Live by developers now assigned to other projects. This stalled the new projects while the old ones were being finished, and resulted in fewer projects completed over the course of a year and thus lower VP bonuses.
Problem 2: Solution
Extended the length of each project three weeks to contain the vast majority of defects within the projecct itself, before go-Live. Stopped allocation of developers to projects without considering what else was on their plate. Convinced VP's that keeping the team together until the project was done, then deploying, would increase department velocity and thus increase their bonuses.
Problem 2: Results
Better, more accurate forecasting. Virtually eliminated developers being pulled back to finish previous projects. Projects truly completed before Go-Live. 3-4 more projects completed per year. Higher bonuses for VP's.
Problem 3: Developers Testing In Production
With unuseable Pre-Prod environments and applications placed into Production that were incomplete or bug laden, developers opted to test in the Production system, with sometimes dire consequences.
Problem 3: Diagnosis
Test environments were incomplete and inconsistent, with some services available only in specific environments, and some servers vastly undersized. Equally important, there was virtually no test data. With major contention over what environments were available and no good way to test even if an environment was obtained, projects routinely tested in Production after their deployment date.
Problem 3: Solution
Optimize increased test data from 3% to 100% of Production data, using FlexClone technology to provide virtual test environments for each developer and tester, without the overhead of replicating large amounts of test data. Repurpose former Production servers to upgrade and provision the test environments so they were complete. Build new servers using new tech at a fraction of the cost of traditional blades.
Problem 3: Results
Eliminated testing in Production, greatly reducing Production defects and associated risk. Increased available test data from 3% to 100% (Previous attempts estimated cost at $8-15M. We did it for $2M). Virtualized environments so each developer and tester can work freely without stepping on each other.
"Your ideas were very implementable." – Operations Director