Startups and daily operations
Life is mostly unjust in that when faced with obstacles normal human beings often tend to make wrong decisions unless the individual has the right experience to make the right decision. Very recently, we had discussions on the operations process in our startup. The opinions shared below hold true only for web startups. Also, pardon me for the reference to Amazon.com (thats where I used to work for).
It is a usual practice in Amazon.com to have engineers as oncall (on rotation basis) who would be the first person to respond to any issues that arise in the running systems. Typically, there are two persons labelled as primary and secondary who are responsible for resolving these issues. Also, Amazon.com measures the severity of issues depending on the impact on revenues (which is derived from customer impact).
Now, coming to the reality, ‘operations’ is something that literally kills engineers. It would be the last thing that any engineer would want to do, resolving tickets. It is okay for a big company like Amazon.com to have engineers dedicated for operations since they have large pool to resources to utilize from.
In the context of startups, where the business is always resource crunched, it is extremely important to ensure that the system is built with much greater quality right upfront. With a rough estimate of 4 engineers in a typical startup, it doesn’t make sense to dedicate two resources to fix operational issues. And typically fixes for the operational issues would be patches or quick fixes, which is not really the smartest way to have avoided that issue in the first place. And if there are too many issues in the system that you need pagers and trouble ticket systems to manage issues, then perhaps we are talking about an established big company or a bunch of morons trying their hands on programming.
And moreover, startups typically would have systems that do not demand a lot of resources leading to halts. The prime differentiator for any web startup is coming up with strikingly innovative features that benefits the end user. For companies like Amazon.com or Google.com, a defaced page is a very serious issue that pertains to their reputation. Whereas, a defaced page for a startup is completely fine and the seriousness is merely a parameter of the standards set by the developers of that startup. As long as the business is unaffected, minor hitches should never be considered serious enough to buy a trouble ticket system.
In summary, when revenues and reputation are not at stake, it is better to focus on improving those rather than wasting time in daily operations with trouble ticket systems.