There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tail of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that rely on accurate transcription for their operations.
Jon Goldsmith, co-founder of Tetra, a voice transcription startup, understands this challenge — in fact, he is even willing to admit that he hasn’t 100 percent cracked the problem. But Goldsmith believes the answer lies in deep learning, and he’s setting out to prove it with a $ 1.5 million seed round led by Amplify Partners, with participation from Y Combinator and a number of angels.
I dropped by the Tetra office to check out what Goldsmith, his co-founder Nik Liolios and one other engineer had created. Goldsmith gave me a call using his smartphone with the Tetra app installed. As he and the deep learning models running in the background listened, I threw out a barrage of challenges for the transcription service.
Speaking at varying speeds, throwing out numbers, startup names and other tough words did stump Tetra to some degree — but to be fair, there is no AI that I haven’t broken. Given how easy Tetra is to use, I could see it being used as a backup reference or for record keeping — turn it on, forget about it and use it to search through notes later.
In cases where 99 or 100 percent accuracy is required, Tetra offers human transcription for a fee and a 24-hour wait. This actually helps both customers and Tetra in the sense that accurate transcriptions can feed back as training data to improve future performance.
Goldsmith told me he is finding traction selling to investors making frequent diligence calls. These customers want Tetra to create a permanent record of conversations with industry experts. Other, more traditional, enterprise use cases exist as well, like within sales.
This seems to be working out fairly well for the company. And things remain fairly lean with the three-person Tetra team working out of a residential apartment dually zoned for commercial. On the engineering side, a lot of the underlying infrastructure is being powered by off-the-shelf APIs.
This is actually a good thing, because it means Tetra isn’t wasting time building things that already exist on the market and instead is focusing on collecting a massive transcriptions data set that will only continue to improve the quality of the service moving forward.
The team’s approach is heavily dependent on being able to optimize which parts of conversations are sent to which cloud API. For example, some NLP service providers are better at understanding speech relating to movies, music and media, while others are better at numbers, etc.
The $ 1.5 million in seed financing is going to be used to scale up the engineering team and improve machine learning pipelines. Tetra includes search functionality so users can quickly find specific sentences within traditionally unsearchable voice recordings. I could see this becoming more proactive in the future — flagging names and dates automatically, for example.