H(ai)ckathon
Andrei Zhozhin
| 5 minutes
This is my second hackathon and I can conclude that I like such type of activity. The topic of this hackathon was to utilize AI/ML for existing applications.
Idea
Add voice search to the exiting application used by business users on mobile devices (and desktop) as it is hard to type long sentences on the go into narrow mobile interfaces. The existing application could show different reports to the users using different measures(metrics) and different dimensions(time, organization hierarchy, economic sectors, etc) to support business decisions.
Pre-requisites
The application should already support free type search.
Architecture
- Mic Button is a new UI element
- Media Recorder abstraction to the browser recording API (need to support multiple browser audio formats)
- Speech2Text UI controller orchestrates all things in UI layer
- Audio is sent with HTTP body using
Content-Type: audio/webm
- Webm audio format is transcoded to WAV using FFmpeg (new process spawn for every request)
- Transcoded audio is sent to Azure Speech to Text service
- Azure Speech to Text service converts audio to text and return
Content-Type: application/json
- Resulting JSON returned to UI
- Recognized text with highest probability inserted back to search box element and search is executed
- (A) Deployment was performed with ARM (Azure Resource Manager) template
- (X) Testing of
Azure Speech to Text
service andSpeech2Text uService
was performed with Postman
Technical details
As the application was already implemented using the microservice approach, we just followed the same pattern. We have selected python for this microservice as it requires only simple logic to register in service discovery, check security tokens, do transcoding, and finally call the speech to text service. Everything was wrapped into the container and prepared to be deployed to Kubernetes.
As every application use its way to manage configuration it was a bit challenging to add all necessary configuration options and into all places to make the application work properly with other microservices.
Transcoding of audio was implemented with FFmpeg, it is an open-source and very powerful tool that can process audio and video with a lot of capabilities. We have quickly checked available solutions and all of them were far from perfect(commercial solutions were not acceptable, web assembly variants created with emscripten were looking dodgy and heavyweight). So we decided to implement old school server-side audio transcoding.
Caveats
- Audio recording API in the browser works only with HTTPS
- Recorded audio format differs in different browsers (chromium-based - audio/webm, firefox - audio/ogg, safari - audio/wav)
- When UI wants to access the microphone browser every time would show a popup and if the user rejects or just ignore the popup - nothing would work
- As the recording is available on HTTPS only, the back-end endpoint should be also behind HTTPS, otherwise, requests would be blocked by the browser (mixed content is not allowed)
- Speech2text service might return multiple results with different recognition probabilities, so we need to decide which to choose (results might slightly vary and have comparable probability)
- Speech2text is not perfect and can return very different output depending on how the one pronounces words (accent), so it is important to calibrate the probability threshold (keep it high, but not very high to keep it usable ) ~ 75%
- If your system does not support free type search this approach might be useless
Lessons learned
- Prepare dev environment before hackathon, otherwise precious time would be wasted on troubleshooting
- Choose technologies that you are familiar with for implementation, otherwise precious time would be wasted on learning and troubleshooting
- Have colleagues reachable for you during the hackathon to give you a piece of advice in the area where you are not an expert if you are stuck - may save you hours
- Know the skills of your teammates as they could pair with you to help
- Work in pair with someone - second pair of eyes could help to spot silly typos or prevent going too deep
- Plan time and make sure you don’t have distractions from your work
- Agree with your family members about your limited availability to them and explain the importance of the event beforehand
- Have regular breaks, otherwise productivity world degrade quickly you more time would not bring any value
- Sleep full 8 hours, it is the best recovering mechanism your body has
- Avoid too much caffeine and other stimulators, if you plan to work long hours let it be natural, if you drink too much you would not be able to sleep well and the second day would be ruined
- Dry run your presentation several times to understand your timing and cut filler words and extra explanations
Timing
I was working from 7 am to 11 pm for both days with breaks for breakfast, lunch, and dinner, and run during the first day. I should have run on the second day as well but screwed by the schedule and was not able to do that. Such long working hours very affect performance very hard.
Results
Our team have won in the category “most complete solution” - as we managed to implement full flow and demo real use-cases. So we have developed a complete business feature in around 48 hours.
Summary
Internal hackathons are very different by nature in comparison to “classic” hackathons. This one in addition was completely online. But I can see the main benefit of networking in such a setup. Our team had members from the US, UK, Switzerland, Poland, and India. It was a great opportunity to work on an existing product and extend it in a meaningful way, I hope our changes would be merged soon into the master branch and released to users.
I’m looking forward to offline hackathons when everyone is in the same room working hardly 2 days in a row to deliver great product :)
I also would like to thank all teammates who was working hard with me during this time on the project: folks you are a great team!