H(ai)ckathon

Andrei Zhozhin

October 1, 2021 | 5 minutes

Hackathon

This is my second hackathon and I can conclude that I like such type of activity. The topic of this hackathon was to utilize AI/ML for existing applications.

Idea

Add voice search to the exiting application used by business users on mobile devices (and desktop) as it is hard to type long sentences on the go into narrow mobile interfaces. The existing application could show different reports to the users using different measures(metrics) and different dimensions(time, organization hierarchy, economic sectors, etc) to support business decisions.

Pre-requisites

The application should already support free type search.

Architecture

Speech to text feature architecture

Mic Button is a new UI element
Media Recorder abstraction to the browser recording API (need to support multiple browser audio formats)
Speech2Text UI controller orchestrates all things in UI layer
Audio is sent with HTTP body using Content-Type: audio/webm
Webm audio format is transcoded to WAV using FFmpeg (new process spawn for every request)
Transcoded audio is sent to Azure Speech to Text service
Azure Speech to Text service converts audio to text and return Content-Type: application/json
Resulting JSON returned to UI
Recognized text with highest probability inserted back to search box element and search is executed

(A) Deployment was performed with ARM (Azure Resource Manager) template
(X) Testing of Azure Speech to Text service and Speech2Text uService was performed with Postman

Technical details

As the application was already implemented using the microservice approach, we just followed the same pattern. We have selected python for this microservice as it requires only simple logic to register in service discovery, check security tokens, do transcoding, and finally call the speech to text service. Everything was wrapped into the container and prepared to be deployed to Kubernetes.

As every application use its way to manage configuration it was a bit challenging to add all necessary configuration options and into all places to make the application work properly with other microservices.

Transcoding of audio was implemented with FFmpeg, it is an open-source and very powerful tool that can process audio and video with a lot of capabilities. We have quickly checked available solutions and all of them were far from perfect(commercial solutions were not acceptable, web assembly variants created with emscripten were looking dodgy and heavyweight). So we decided to implement old school server-side audio transcoding.

Caveats

Audio recording API in the browser works only with HTTPS
Recorded audio format differs in different browsers (chromium-based - audio/webm, firefox - audio/ogg, safari - audio/wav)
When UI wants to access the microphone browser every time would show a popup and if the user rejects or just ignore the popup - nothing would work
As the recording is available on HTTPS only, the back-end endpoint should be also behind HTTPS, otherwise, requests would be blocked by the browser (mixed content is not allowed)
Speech2text service might return multiple results with different recognition probabilities, so we need to decide which to choose (results might slightly vary and have comparable probability)
Speech2text is not perfect and can return very different output depending on how the one pronounces words (accent), so it is important to calibrate the probability threshold (keep it high, but not very high to keep it usable ) ~ 75%
If your system does not support free type search this approach might be useless

Lessons learned

Prepare dev environment before hackathon, otherwise precious time would be wasted on troubleshooting
Choose technologies that you are familiar with for implementation, otherwise precious time would be wasted on learning and troubleshooting
Have colleagues reachable for you during the hackathon to give you a piece of advice in the area where you are not an expert if you are stuck - may save you hours
Know the skills of your teammates as they could pair with you to help
Work in pair with someone - second pair of eyes could help to spot silly typos or prevent going too deep
Plan time and make sure you don’t have distractions from your work
Agree with your family members about your limited availability to them and explain the importance of the event beforehand
Have regular breaks, otherwise productivity world degrade quickly you more time would not bring any value
Sleep full 8 hours, it is the best recovering mechanism your body has
Avoid too much caffeine and other stimulators, if you plan to work long hours let it be natural, if you drink too much you would not be able to sleep well and the second day would be ruined
Dry run your presentation several times to understand your timing and cut filler words and extra explanations

Timing

I was working from 7 am to 11 pm for both days with breaks for breakfast, lunch, and dinner, and run during the first day. I should have run on the second day as well but screwed by the schedule and was not able to do that. Such long working hours very affect performance very hard.

Results

Our team have won in the category “most complete solution” - as we managed to implement full flow and demo real use-cases. So we have developed a complete business feature in around 48 hours.

Summary

Internal hackathons are very different by nature in comparison to “classic” hackathons. This one in addition was completely online. But I can see the main benefit of networking in such a setup. Our team had members from the US, UK, Switzerland, Poland, and India. It was a great opportunity to work on an existing product and extend it in a meaningful way, I hope our changes would be merged soon into the master branch and released to users.

I’m looking forward to offline hackathons when everyone is in the same room working hardly 2 days in a row to deliver great product :)

I also would like to thank all teammates who was working hard with me during this time on the project: folks you are a great team!