Transcribing Voice Inputs: Store and Send Temporary Files?

oTree Forum >

Transcribing Voice Inputs: Store and Send Temporary Files?

#1 by Hauke

Hi all,

I am working on an experiment where I record participants decision expressed verbally/spoken out loud. As recommended by Chris some time ago, I encode the voice recording as a Base64 string and store it as a LongStringField. This works fine and is very useful.

However, I want to go a step further and also transcribe the recording in near-time such that my next app or page can utilize the decisions each participant made. I thought of using the whisper API which requires me to store the voice recording as an audio file somewhere. Is that possible from within oTree? (After transcription, I could delete the file, if it was on the server, because I have all the information I need.)

Thanks in advance
Hauke

#2 by clintmckenna

Not sure if you have worked out a solution, but will post in case it is helpful to others. I have adapted some code from another project to save audio as a webm file and run it through the Whisper API using the live page function. Here is the repository: https://github.com/clintmckenna/oTree-Whisper

The files are just saved as the player's id_in_group.webm in the static folder. No idea if this scales well or not. Perhaps you could add a line to delete each file after the page is done if you don't need to save the files (or your IRB doesn't want you to). Thanks and let me know if it works!

#3 by Chris_oTree

2 issues with storing files in the static folder:

- All files in the _static folder are publicly accessible. So if someone knows your file naming scheme, they can go to yoursite.herokuapp.com/static/1.webm, they can download a user's recording.
- If using Heroku, your app should not create any files in the project folder. Because Heroku automatically deletes all those files on a regular basis (e.g. daily).

#4 by clintmckenna

Thanks Chris! Interesting challenges with the static folder, and with Whisper API's quirks with not using a real file. I have updated my repository to write, then read the file from Amazon S3 instead. I think it should address the two issues by not saving anything on the server itself!

#5 by Hauke

Thanks Clint and Chris!

If I store the audio file as a base64 LongStringField and only need the *.webm file to be passed to the Whisper API, then Clint's initial solution should work for me (perhaps with the addition to delete the file in the static folder after it was transcribed), right?

#6 by clintmckenna (edited )

Actually, it turns out I was overcomplicating things... You can just pass the base64 decoded data to Whisper in the requests.post() function. No need to save the webm files anywhere! I have updated my code, but left in the Amazon S3 bucket code in case anyone wants to save their audio.

Write a reply

Set forum username