Shubh's Musing

Tuesday, December 9, 2014

Lessons learnt practicing Agile in distributed environment

Agile software development practices have now become mainstream with SCRUM and Kanban being the most popular practices. Although the fundamental tenant of these practices is colocation, any medium to big organization will have teams spread across different geographical locations which may even be in different timezones. Over the past few years, I have worked in such setup and want to describe my experiences.

What works:

Distributed yet collocated team:

In this setup, even though the product or engineering organization is spread across multiple locations, the scrum team is collocated in each location. This means that the team in each location will have cross functional team members such as developers, QA, scrum master and business analyst. The team could potentially have dedicated architect and DBA. Although there is a single product owner, the business analysts acts as proxy product owner in distributed collocated team. DevOps movement furthers this by even bringing operations folks together with dev team.

The team in respective location will have its own sprint planning, backlog grooming, stand up and retrospective. Sprint review can be independent or combined with other scrum teams. It would be best to have engineering manager in each location who is accountable for the deliverables of his or her team.

Common backlog with feature grouping:

Product backlog, about which I have written earlier, is a key artifact for any agile product development team. It is recommended to have one product backlog with features grouped under themes. This allows the collocated scrum teams to focus on feature groups which can be included in product releases. When the team gets really large, single product owner may become a bottleneck. In such situation, there can be area product owners responsible for the particular area of product.

Collocated release planning:

Release planning activity involves deciding high level scope of a release. In a distributed team, it is preferable to get required members such as dev manager, leads (Dev and QA), project manager, product owner in a single location and plan the release. In order to plan a release, it is important to be ready with high level estimates of different features.

Face to face interactions:

It is a widely accepted fact and I have also experienced that communication is 70% non-verbal and 30% verbal. Best way to communicate is always face to face. In a distributed environment, this can be done through video calls and conferencing. I have found that face to face interaction through video calls makes a huge difference. Most IM tools now have the video call capabilities. I have also used google hangout as a back up if enterprise tool doesn't work well.

Invest in collaboration tools:

Most organizations use tools for collaboration and content management but the best collaboration tool in my opinion is WIKI. It is very light weight and effective. Although many teams use physical agile boards, it helps to have agile project management tool to complement physical boards. I have used Agile JIRA and VersionOne and found both useful.

Travel Budget:

Video communication helps with regular interactions among distributed teams but nothing beats in-person interaction. I have found it very useful for team members to visit remote teams and work with them for at least one sprint. This also breaks down any barriers that may exist between team members.
Travel is always considered an expense and gets slashed as part of cost saving efforts. However, in a distributed environment, it goes a long way to get people to meet once a year or at the very least, beginning of a project.

Common engineering practices:

Each team may have its own work culture and schedule but it is very important to follow common engineering practices. This may include but not limited to continuous integration, code review, DOD (Definition Of Done), coding standards, version control strategy etc.

What doesn't works:

Distributed scrum:

I have seen and experienced many teams performing stand ups and planning through conference calls and webex meetings. This is not only ineffective but inefficient as well. People are generally multi-tasking in such meetings and there is constant context switching which is highly inefficient.

Some organizations have adopted totally distributed model where everyone is remote. This not only requires a different culture but also effectively use set of tools and practices to bridge physical gap.

Remote employees:

Many teams claim to practice scrum where most of team members are in one location but 1-2 employees will be remote or home based. In this case, remote employees will dial into conference call for scrum meeting. This is not very efficient as mentioned above. Although occasional working from home is very common, in the past, we adopted a practice where entire team works from home on a agreed upon day of the week. This way everyone is either in office or remote.

Separated functional members:

Separate dev and QA teams is totally anti-agile. A common organization setup is developers and QA reporting into separate hierarchy. This may be ok as long as they work as part of the same team. It also applies to analysts, architects and DBAs. Handing off work to functional team members who are remote, doesn't work.

Detailed Estimation by remote teams:

Remote team members getting on a call and estimating backlog doesn't work well. Estimation requires common understanding of the project/backlog which requires discussion. This is best done in a collocated setting.

Wednesday, September 18, 2013

Software Engineering - Operational Requirements

Over the years I have observed that functional requirements always get higher priority compared to non-functional requirements (NFR). Some of these NFRs cover day to day operational aspects of any software product. I plan to write about scalability and availability pattern in future but I am attempting to list some of the key operational patterns.

Automated Deployment: In the recent times, continuous delivery of software has gained a lot of momentum. One of the requirements for delivering software quickly is to have the deployment process completely automated, more like a single click deployment. A precursor to automating deployment is to setup a continuous integration environment. There are tools like Hudson, Jenkins, Teamcity etc which can be used for this. Tools like puppet and chef can not only help with automated deployment but infrastructure automation as well.

Monitoring and Alerting: A system should be able to alert on events identifying an outage or degradation of its performance. This means that all critical events should be identified and configured to generate an alert. These alerts should not only be tested but validated in production environment. Tools such as CA Unicenter, Zabbix, Nagios, Ganglia etc can be used for this purpose. Most load balancers can also be used for the application health check.

Fault tolerance: The key idea behind fault tolerance is that a system should be resilient and continue to function not only under normal conditions but also under unexpected circumstances. Failures are inevitable. Developing fault tolerant system also involves a change in mindset during design, development and testing phase. It means asking question - "What are all the ways this can go wrong?". Every component of the system must be reviewed to determine failure points, impact and recovery from failure. FMEA exercise can help with such an analysis. Most software systems have integration points and as mentioned in Release It, every integration point will eventually fail in some way, and you need to be prepared for that failure.

Throttling: This pattern refers to the idea that a system should protect itself and not harm other systems it is integrated with. This is achieved by setting a throttling parameter beyond which the system will reject additional requests. In essence -

Applications should throttle client requests
Database must throttle total requests
Throttled events must be logged and monitored

Timeouts: A system should protect itself by adding timeout to all external connections. Well placed timeout provides fault isolation which prevents other systems, subsystem or device from impacting your system. Timeouts can also be relevant within a single application. It is essential that any resource pool that blocks threads must have a timeout to ensure that threads are eventually unblocked whether resources become available or not. Related to timeouts is a mechanism for retries. Most explanations for a timeout involve problems in the network or the remote system that won't be resolved right away. Immediate retries may not be helpful as it may result in another timeout. In such cases it is recommended to queue the operation and retry later.

Logging: A system informs about its activity through logging while not impairing its operation.Logging must ensure that not only critical events are logged but key metrics are available to infrastructure that consumes and reports on these events. Things to consider -

Decouple application from logging resources
Application processing continues in the event of logging resources being unavailable. This can be achieving through Asynchronous logging.
Automate log rotation and archiving
Error codes and its description is defined and documented

Instrumentation: A system should expose key system and application metrics as well as errors which could be utilized for tracking and alerting. It also helps in troubleshooting issues and root cause analysis Instrumentation helps identify highs and lows of system from which historic patterns can be created. Data from the instrumentation can be used for capacity planning too. Appropriate thresholds can be set on these metrics so that proactive actions can be taken when critical limit is reached. Two broad categories of instrumentation are

System Metrics: These are metrics for the infrastructure components such as hardware, OS, databases. The metrics are for CPU, memory, disk space, network utilization, file descriptor utilized.
Application Metrics: The metrics here could include response times, error count, heap/shared memory usage, threads, inbound/outbound connections etc

Testing: Apart from the manual and automated application testing, additional system level tests should be executed. These are

Performance Tests
Load Tests
Soak Test
Destructive Tests
Testing alerts, timeouts and throttling

This is by no means a complete list but key non-functional requirements that should be taken into consideration in software development life cycle.

Wednesday, August 28, 2013

Software Architecture - Message Queues

I strongly believe that a key aspect of software engineering is applying proven architectural patterns. Lately I have wanted to write about some of these patterns which address many engineering concerns in any enterprise scale software. One such pattern is Asynchronous processing through message queues.

There is ton of information available on internet on the usage of message queues. I have been in many discussions about software design where the topic of message queue would come up and someone would say - "Why should we use message queues?". Here is my attempt to address this question.

Asynchronous Interaction: Many times the messages or events don't need to be processed immediately in real time and can be delayed. It means that critical processing can be done in real time and non-critical processing can be delayed. This could potentially help in reducing response time for the synchronous processing. Message queues enable the asynchronous processing by allowing messages to be put into a queue which can be processed later. Notifications such as sending e-mails is one such example.

Decoupled Architecture: Decoupled software components allow each component to evolve and scale without impacting other components. Message queues form an intermediary between the two components which share which agree to a data (message type) based interface.

Reliability through Guaranteed Delivery: Messaging infrastructure provides guaranteed delivery of messages. In the event of failures, messages are not lost and can be recovered and reprocessed.

Scalability: Message queues decouple the producer and consumer components. This allows scaling up the rates at which messages are added to the queue or processed. By adding new processes, the system can scale without requiring additional code changes.

Resiliency: Message queues provides isolation between various components. This means that the entire system doesn't go down if some parts fail. Systems can be designed such that the critical components can continue processing in the event of failures in the other parts of the system.

Throttling: In order to protect the system from getting overloaded, it is necessary to throttle the request processing. Typically when a throttle limit is reached, the application denies request processing. This may not be acceptable in a HA system. In such cases, requests can be queued when a throttle limit is reached. The queued requests can then be processed when the load on the system is reduced.

Throughput: Message queues allows possibility of concurrent execution of the processes. This means that the system throughput can be tweaked through addition of processes. However there is some tension between throughput and reliability. In order to increase message throughput, it is recommended to turn off message persistence.

Ordering: Driven by the business needs, many applications require message to processed in a sequential manner. Most message brokers allow message ordering through server side or client side mechanisms.

Event Driven Processing: Messaging frameworks provides a mechanism to implement event driven architectures. Such systems typically consist of event emitters (agents) and event consumers (sinks). Messaging frameworks naturally fit into this model.

This is not a complete list but some of the key benefits of message based solutions.

Saturday, March 16, 2013

Agile software development manager

Agile software development methodologies like SCRUM have become mainstream in the industry now. Scrum is a framework which defines a set of roles and events that the team follows with a goal to deliver working software at the end of each short iterations. The framework defines 3 roles:

Product Owner
Scrum Master
Team

The team consists of developers, testers, business analyst, architects and DBA. Product Owner is responsible for the vision and manages priorities of the product features. Scrum Master is the owner of the process that the team follows. Although Scrum has defined these roles, it doesn't mention anything about the role of Development/Engineering Managers and Project Managers. The role of a project manager in an agile environment is described well in this book. Being a development manager, I wanted to have a clear understanding of a manager's role in an agile environment. Many have talked about it here and also in books such as Management 3.0. Based on my experience and understanding, I have attempted to describe the responsibilities of a development manager below.

Delivery of Software Releases: Depending on the organizational structure, accountability of the software releases lies either on the project manager or the development manager. The development manager ensures that the Scrum team follows the release plan and the release is ready to be shipped/deployed for the customers. The project managers help with budgeting, risk management, milestone tracking and co-ordination of the releases. Although the scrum master is responsible for removing bottleneck, the person may need help from management. The manager needs to provide such help.

Staffing: The development manager makes sure that the development team is fully staffed with the right people with the right skills sets. I have found this to be truly challenging given the importance of having the team with the right team members. He/she must work with HR and recruitment department to ensure that consistent hiring process is followed.

Manage Environment and Relationships: Agile movement has changed how we develop software. Instead of the traditional command and control approach to the concept of having a self organizing team. The team commits to a sprint goal and tried to achieve it at the end of the sprint. The team decides how it wants to achieve the sprint goal. In this case, the development manager ensures that a safe and fun environment exists where creativity and innovation comes out. He/she manages the conflict and ensures effective collaboration and communication takes place within and outside the team.The development manager works with the Scrum master to build a team where people trust each other and enjoy working together.

Manage processes and practices: The development manager is responsible for instituting the process which fosters better collaboration and visibility within and outside the team. In many organizations, the scrum team is responsible for getting the software ready to be shipped. However there are separate teams for software delivery, training, operations and support. In that case it is important to have clearly defined roles and responsibilities and process in place. RACI matrix helps in defining roles and responsibilities across multiple teams.

Coaching and Performance Improvements: As mentioned in the risk management book by Tom Demarco, one of the risks to any software project is people turnover. There are two major factors that contribute to people turnover; factors that push people out and factors that pull people to other companies. There is not much that can be done about the later but the push factors can be controlled. The manager needs to invest time on coaching team members and help them with their career path. He/she needs to actively support and encourage team members, share career opportunities, describe what it takes to get promoted, offer candidate and actionable feedback and lead by example. Development manager also evaluates the performance and provides inputs for improvements.

Technology Radar: The development manager keeps in touch with the changing technology landscape. He/she doesn't need to be expert in the technology, this is best suited for architects and leads, but should be comfortable with current and upcoming technology that would impact the product. The manager tries to ensure that the team members are aware of the newer technology that would solve business problems.

Reports and Metrics: People frown when the subject of metrics come up. This is mainly because the metrics don't get used properly in organizations. Metrics when used properly provides means to make continuous improvements. This is well described in this article about use of metrics. The development manager identifies essentials metrics and reports which can not only be used to make decisions but also provide scope for continuous improvements.

This, by no means, is the complete list. However at a high level it covers the areas where the development manager should focus on, especially in an agile environment.

Monday, April 23, 2012

Scrum Product Backlog

Scrum has proved to be one of the most successful software methodology. In my current team we have been using Scrum for almost two years now. Most teams perform various Scrum events such as Sprint planning, Stand-up, Sprint Review etc. They also have a well planned Sprint Backlog but they lack a good Product Backlog. As mentioned in the Scrum Guide, Product Backlog is a very important artifact. However many teams lack it mainly because they may not know about what is a good backlog. Here is my attempt to describe its importance and value. Many have already talked about it here and here.

As mentioned in the scrum guide, product backlog is

"An ordered list of everything that might be needed in the product and is the single source of requirements for any changes to be made to the product "

Product Backlog contains all features, functions, defects, enhancements that are required to be done in a product. For a software product, it means all the functional as well as non-functional requirements. This means that it needs to be updated regularly and should evolve over a period of time. Being a single source of all requirements, it becomes the single most important artifact of a Product.

What makes a good product backlog? One which has the following characteristics.

Detailed : Higher priority items are described in more details while lower priority items have less details until you can barely make out a product backlog item. The higher priority items will be worked upon in the upcoming sprints. As shown below, it follows the iceberg model.

Estimated: This means that stories are appropriately sized in terms of story points which is an estimate of relative complexity. This is best achieved through planning poker and is performed in backlog grooming session. Team should spend at most 10% of Sprint time on backlog grooming.
Emergent: Backlog is like a living entity. It grows and evolves. New items are added to the backlog and old items can be removed as more information becomes available.
Prioritized/Ordered: Stories or defects are prioritized by the Product Owner based on the business value and customer requirements.

A good product backlog means following:

Good product backlog helps with effective Release Planning. Stories can be added to upcoming sprints which will constitute a release.
Good product backlog helps with Requirements Clarity. In my personal experience I have noticed that development team often complain about lack of clear requirements. As the team grooms backlog regularly, more details are added to the stories.
All these leads to Reducing Risk when delivering the product.

So if you are following Scrum but haven't paid attention to the Product Backlog, I strongly recommend making it one of your highest priorities.

Saturday, January 21, 2012

Peeking at Heavens

I have always wanted to see objects in space with a telescope. Finally I was able to satisfy that desire when I bought my first telescope in December 2011. With so many different types of telescopes available, the first challenge was to choose the telescope that was right for me as a beginner. I contacted some folks at Texas Astronomical Society and they advised me to go for reflector telescopes like Orion StarBlast 4.5 or the Dobsonion. I didn't want to buy a big telescope because it won't be easy to carry it around. Fortunately Orion has good material and videos on their website which was very helpful in choosing the scope. So I decided to go for SpaceProbe 130ST.

The telescope arrived as scheduled. The assembly took two hours. Although it was straight forward, I had to pay close attention to each and every step. I had to learn about aligning the telescope to polar north; basically point the telescope to start Polaris. With the telescope ready, it was time to experience it. During winter months in northern hemisphere, planet Jupiter shines brightly. I use Google Sky Map on my android tablet to locate the stars and planets. If you don't have it, there are free star charts available on Astronomy.com web site. However using google sky map is much easier.

So in the evening we decided to see Moon and Jupiter from my backyard. My kids Shlok and Aneesh were more excited than me. I used finder scope to point to moon and when I looked through the eye piece, image was hazy. I knew something is not right. I point to Jupiter and same thing. Then it struck me that the eye piece has a nob to focus the lens. So I do that and Vow!! We could see the craters and slight elevations of the mountains. I pointed to Jupiter and Vow again!! We could see the two brown bends that Jupiter has. We could also see Jupiter's four moons. On that night, all moons were aligned in the straight horizontal line. It was an amazing view.

Overall it was a very satisfying experience. One thing to note is that the objects appear small through telescope and no where close to what you see in photographs. However the experience is still amazing. The next thing is to look at Mars and Saturn as they become visible in Spring.

Tuesday, November 22, 2011

Self Organizing Team

Agile methodologies, specially SCRUM, have the notion of Self Organizing Team. Scrum is a process framework which has three main entities.

Team
Product Owner
Scrum Master

The team consists of developers, testers, business analysts, DBAs, architects. This group of people are given a set of tasks (Stories) to finish within a time box known as Sprint or Iteration. The team self organizes and delivers the functionality at the end of the time box. Last month I had the experience of being part of such a team.

Sabre, the company I work for, organizes Hack Day every year. The goal of hack day is to prototype a working software in 24 hours. The team chooses the idea for the prototype, technology and tools. The prototype is presented to a group of judges who decides the best hack. This year my team decided to participate in the Hack Day.

As a team, we started brain storming the ideas for the hack. One of my colleagues Raj Naini presented the idea of using Near Field Communications (NFC) to build a paper less workflow for Air Travel. We work on a product called Sabresonic Loyalty which helps manage points and rewards given to the travelers by the Air Lines. The travelers can use the points to purchase tickets and ancillaries for their travel. So we decided to develop the prototype which allows travelers to use their points using any NFC enabled smart phone. The workflow is simple.

Traveler's phone stores his/her account information. So the phone can be used like a wallet.
The traveler checks in at the airport counter kiosk where he is given an option to buy ancillaries such as extra baggage, leg room etc.
If the traveler decides to buy ancillaries, he/she is given an option to pay using his accumulated points. At this point the traveler brings his phone near the kiosk and is able to pay without even touching. The interaction is made possible using the NFC interfaces in the phone and the kiosk.

After the idea stage, the team decided on the tools and technologies. Through 1-2 white board sessions, we decided on following.

nfcpy library for exchanging information between kiosk (simulated on laptop) and phone.
Mobile app using jQuery Mobile and Phone Gap.
Atomosphere to send aync notification to mobile app
Loyalty product exposed REST API

On Hack day the team got together in a big conference room and started coding. We paired on various pieces of work. Each pair worked on a separate piece which had to later integrate to create a final prototype. It was amazing to see how well team collaborated when everyone of us was co-located. Being together in a separate room ensured that there was no external distraction from the rest of the team. However the team hadn't integrated different pieces until two hours before the deadline. And then all the pieces fit together perfectly to create a successful prototype.

Although my team didn't win the first prize, we came close second on the public voting award. The whole team felt great about the effort. This proved what is said about an agile self organizing team

"Bring a bunch of smart developers together, leave them alone to do their job and watch them create amazing software."