Testing Tour Stop #4 : Exploring all about Testability, Operability and Observability with Rob Meaney

I had my fourth session on my #TestingTour with Rob Meaney. I was on my routine of going through the resources on the ministry of testing. I came across Power Hour about Operability by Rob Meany and I had few moments of - Yes I’m going through a similar situation currently, while I was reading the questions and answers. I was already keen to understand about - Operability, Testability and Observability but that power hour just got me more curious and motivated to find out more about this and most importantly wanted to see I could apply any of these in my current situation.


Session


So our session started with Rob's question to find out about what kind of product or setup I’m working on and whether its microservices.
I gave context that our application is microservices and we are currently migrating to cloud. I also shared a couple of scenarios related to production bug firefighting. Around how difficult it was for us to find the root cause. And that's where he shared his scenario when he was in the same situation. It was really great to know that as I thought it’s just only us who are in this kiosk situation due to our own reasons. That gave me some motivation that I can really try to change the situation.

Rob started off by sharing the truth that "Deployment is the beginning and not the end" In fact, this is where we actually start to learn and understand the system and that's where I nodded and I said yes yes. As I could really agree to that and I shared from how I used to feel that my job is done once I perform a smoke test on production after every release to how I changed my perception about trying to understand the system and users once the features were deployed on production. This is where he mentioned about designing the system for testability and operability and the point which was such a great reminder that we testers are not just testers anymore, we are more like influencers within the team which was a great takeaway from this session for me already.


Rob shared how one of his developers approached him and said, how we could design the software to make it easier to test after having torturous regression nightmares. That’s where he went on to research and came back with the CODS model which stands for Controllability, Observability, Decomposobility, Simplicity. Can you believe if I tell you that one of the developers in my team was helping me by writing automated tests/checks and he got frustrated by seeing how the code has been implemented which was limiting him to write certain tests.
I was even more keen to understand this concept of CODS model which might be useful for me to use at this point.




To start with Controllability, Rob explained to me with few examples here. For example, if I have to test a calendar I can ask the developer if I can have a way to control the system or to inject the time or day so I can test the functionality instead of waiting until next date and time. This got me thinking about one of the features of my application. When a user changes the status of the task, if there is a rule that says ‘create a note for a user for the following day’. If I’m testing this functionality I should not have to wait for the following day to check if the note is been created. There should be the ability to inject or run something that allows me to test it straight away instead of waiting for the next day. And this is called Controllability I thought.


When I started to tell him how I started to observe the logs(which were not really useful as it had loads unnecessary information as well) while performing an action on the UI to understand more about the system on the feature level, that’s where Rob mentioned to me that this is as well part of observability. Yayy, I got super thrilled here. It is about observing and understanding to see what’s happen and the ability to understand what’s happening within the system. Anything that allows us to know if there is a problem or not. That’s where I share another example where I was getting loads of 503 errors while testing, so I had to first check my console log, then I tried to check the logs and finally when I checked the services, one of the services went off. And here Rob mentioned that this is where it is important to have a structured log. I had this as a takeaway and I planned to emphasise my team to have structured logs.


Rob started to explain about decomposability which is all about having independently testable components. Any change in one service should not affect any other service. This is where we discussed about PACT which allows us to test the agreement between the API’s and gives the confidence that the change made to an API hasn’t broken the agreement. Here Rob mentioned 
a great approach of how they follow bug bashes. The contract is agreed before they develop a service and they do a bug bash with the team consuming the service when it’s ready to get the feedback on whether the contract still meets their needs.



Simplicity is self-explanatory which means the system needs to be easy to understand. We also discussed a lot how they used a combination of feature flags and blue-green deployment which was interesting to know. Now was the time to discuss different exercises which Rob has tried with his team which was another interesting part of this session. We discussed the 10 P’s of testability which I’m going to try with my team very soon. And the other exercise was how they used team incident learning reviews to improve testability which really got me thinking about how we could also use it to learn after every incident and mostly it's useful because we are a new team on this product so this would really help us understand the system and the patterns of the problems too.



Learnings



  • Tester’s need to be influencer’s and not just a tester.
  • The truth that deployment is the beginning and not the end. It’s where the actual understanding of the system starts.
  • Designing the system for testability and operability which allows managing the risks effectively, and quickly detect and isolate the important problems with minimal customer impact.
  • CODS model applies to Testability but also in a different way to Operability.
    • Control risk exposure - use of blue/green deploys - easy to switch back quickly & easily
    • Observe system behaviour - Adding instrumentation to visualise critical customer pain points - error rate/response rate/response time & the whole team
    • monitor during the release, synthetics
    • Decompose releases - each release to limit to a single change-set
    • Simplify the release process - ability to deploy or rollback with a single click 
  • Various exercises which help in team collaboration, creating safe space to learn and improve system
  • I learned about how to create a culture with a focus on designing for testability and operability by focusing on asking simple questions like - how does it feel to :
    • Build your software systems?
    • Test your software systems?
    • Deploy your software systems?
    • Operate your software systems?
  • Observability starts with simple questions:
    • How would you know if your system was unhealthy?
    • How would you know if your users were having a bad experience?
    • In the event there was a problem how would you isolate the cause?


It was such an interesting and amazing session which helped me understand these concepts which will now help me to figure out how I could apply these concepts within my team and the product I'm working on.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.