IBM Watson VM transcription

VitalPBX Community Support General Discussion IBM Watson VM transcription

  • Post
    kbohannon
    Participant

    I jumped on board with VitalPBX a few years ago when I learned that with IBM Watson I could finally have voicemail transcription, and it works wonderfully. This question is a little off topic but I hope might find an answer here. Watson’s Speech to Text engine uses markers like %HESITATION%, and I would love to just replace this particular marker with simple ellipses. Has anyone else used Watson’s STT successfully made this change, either with IBM or in the script itself?

    awk ‘/’$BOUNDARY’/{i++}{print > “stream.part”i}’ stream.org
    LANGUAGE=”en-US”
    PATH=”/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin”
    push .
    TMPDIR=$(mktemp -d)
    cd $TMPDIR
    cat >> stream.org

    BOUNDARY=$(grep “boundary=” stream.org | cut -d'”‘ -f 2)
    awk ‘/’$BOUNDARY’/{i++}{print > “stream.part”i}’ stream.org
    PLAINTEXT=$(cat stream.part1 | grep ‘plain’)
    if [ “$PLAINTEXT” != “” ]
    then
    cat stream.org > stream.new
    else
    sed ‘7,$d’ stream.part3 > stream.part3.wav.head
    sed ‘1,6d’ stream.part3 > stream.part3.wav.base64

    dos2unix -o stream.part3.wav.base64
    base64 -di stream.part3.wav.base64 > stream.part3.wav

    lame -m m -b 24 stream.part3.wav stream.part3.mp3

    base64 stream.part3.mp3 > stream.part3.mp3.base64

    sed ‘s/x-[wW][aA][vV]/mpeg/g’ stream.part3.wav.head | sed ‘s/.[wW][aA][vV]/.mp3/g’ > stream.part3.mp3.head

    CURL_OPTS=””

    curl -s $CURL_OPTS -k -u “apikey:$API_KEY” -X POST
    –limit-rate 40000
    –header “Content-Type: audio/wav”
    –data-binary @stream.part3.wav
    “$URL/v1/recognize?model=en-US_NarrowbandModel” 1>audio.txt

    TRANSCRIPT=`cat audio.txt | grep transcript | sed ‘s#^.*”transcript”: “##g’ | sed ‘s# “$##g’`

    mv stream.part stream.new
    cat stream.part1 >> stream.new
    sed ‘$d’ < stream.part2 >> stream.new

    echo “—Transcription—” >> stream.new

    tail -1 stream.part2 >> stream.new
    echo -e “rnrn$TRANSCRIPTrnrn” >> stream.new

    cat stream.part3.mp3.head >> stream.new
    dos2unix -o stream.new

    unix2dos -o stream.part3.mp3.base64
    cat stream.part3.mp3.base64 >> stream.new

    echo “” >> stream.tmp
    echo “” >> stream.tmp
    cat stream.part4 >> stream.tmp
    dos2unix -o stream.tmp
    cat stream.tmp >> stream.new

    fi
    cat stream.new | sendmail -t
    popd
    rm -Rf $TMPDIR

    0
Viewing 12 replies - 1 through 12 (of 12 total)
  • Replies
    mo10
    Moderator
    none
    Up
    0
    Down

    Offtopic: do you like IBM Watson? I have better results with Google Speech API. But might be the case because of my german language.
    With Google Speech there is no %HESITATION%

    0
    kbohannon
    Participant
    Up
    0
    Down

    At the time I had read Watson’s STT was superior but I’m open to change. Have you been happy with Google’s?

    0
    mo10
    Moderator
    none
    Up
    0
    Down

    Try it out and let me know what you think of IBM vs. Google:
    https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea

    “gcloud utility must be authenticated. Before doing this, ‘su asterisk’ so authentication happens in the correct user account”
    This does the trick:
    su -s /bin/bash asterisk
     
     
    0
    PitzKey
    Participant
    US
    Up
    0
    Down

    I’m interested in trying this as well. Do you guys have any documentation links for this?

    Thanks

    0
    kbohannon
    Participant
    Up
    0
    Down

    I got the Watson transcription script over at nerdvittles,com; just search for “vitalpbx transcription” and it should turn it up. After installing, I tweak the sendmailibm file as well as the formatting of the email body via the GUI to get the appearance just right. I’ve been quite happy with it except for the hesitation markers. For that alone I may switch to Google’s.

    0
    mo10
    Moderator
    none
    Up
    0
    Down

    @kbohannon

    Please write me an e-mail so we can get in contact:
    vitalpbx2020@contbay.com

    Thank you

    0
    InTeleSync
    Participant
    Up
    0
    Down

    Has anyone any success in getting Google Speech-To-Text working with VitalPBX? Have followed tony722’s excellent instructions at https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea, but the hangup comes at the point of invoking the script when a voicemail is left in VitalPBX.

    Attempting to call the script by appending mailcmd=/usr/sbin/sendmail-gcloud at the end of the [general] section in /etc/asterisk/ombutel/voicemail__10-general.conf file does not seem to ever get invoked. In fact, it completely breaks the sending of the voicemail message with attachment via email. Have verified that without the script call VitalPBX can send voicemail messages via sendmail or SMTP.

    So how can we get this script to invoke properly when a voicemail is left in VitalPBX?

    0
    InTeleSync
    Participant
    Up
    0
    Down

    Can verify Tony’s script works well on VitalPBX!

    https://gist.github.com/tony722/7c6d86be2e74fa10a1f344a4c2b093ea

    sudo su -

    Install the Google CloudSDK. Follow instructions here: https://cloud.google.com/sdk/docs/downloads-yum

    Next initialize your GCS account as the asterisk user (VERY IMPORTANT).

    su -s /bin/bash asterisk
    gcloud init

    You’ll be presented with an OAuth2 URL. Copy it and launch a browser to log in using your Google Cloud account. It’ll give you a token to copy and paste back into your shell session.

    Will respond back for you to select your Google API project. You should have one already prior to doing any of this.

    If asked “Do you want to configure a default Compute Region and Zone?”, I chose No.

    Exit back to the sudo user if you’re still the asterisk user.

    exit
    yum install jq
    vi /usr/sbin/sendmail-gcloud

    Copy (or fetch) in the script from Github and save.

    chmod 755 /usr/sbin/sendmail-gcloud
    chown asterisk:asterisk /usr/sbin/sendmail-gcloud
    chmod 777 /usr/bin/dos2unix

    Now per VitalPBX’s spectacular support…

    Create the file:

    vi /etc/asterisk/ombutel/voicemail__20-general-custom.conf

    We’re extending the voicemail general settings with a parameter of our own, which is mailcmd. This is so we don’t get overwritten on the next system update.

    [general](+)
    mailcmd=/usr/sbin/sendmail-gcloud

    You want to make certain the sendmail-gcloud script is owned by asterisk.

    chown asterisk:asterisk /usr/sbin/sendmail-gcloud

    You can validate that with:

    sudo su -
    su -s /bin/bash asterisk

    Then just run the script:

    bash-4.2$ /usr/sbin/sendmail-gcloud

    If all is well you’ll get an echo back of the directory you’re sitting in. Kill it (Ctrl-C) and exit. Go to /tmp and you should see the directory the script created with an empty stream.org file within. You can delete those.

    In VitalPBX go to Settings -> Voicemail Settings and do a Save and Reload.

    0
    Up
    0
    Down

    @intelesync

    Thanks for this outstanding guide!

     

    0
    kbohannon
    Participant
    Up
    0
    Down
    @intelesync

    Guide worked perfectly. However, the email I’m getting is showing that VitalPBX is sending null info. Is there something I am missing in how VitalPBX is converting the audio file before sending it to Google Cloud?

    if [ -z “$FILTERED” ]
    then
    echo “(We were unable to recognize any speech in audio data.)” >> stream.new

    0
    kbohannon
    Participant
    Up
    0
    Down
    Hoping someone can help. Unable to get the transcription back from Google. In order to test, I commented out the last line of the script (#rm -Rf $TMPDIR) so that it keeps the streamed files in /tmp/tmp.***. If I send the same stream.part3.flac file and send it to Google, the results come back without a problem (love the punctuation, much better than IBM’s results):

    [root@cf tmp]# cd tmp.wunR4plUsY/
    [root@cf tmp.wunR4plUsY]# gcloud ml speech recognize stream.part3.flac --language-code='en-US'
    {
    "results": [
    {
    "alternatives": [
    {
    "confidence": 0.76667136,
    "transcript": "How are you today? The weather is wonderful."
    }
    ]
    }
    ]
    }
    [root@cf tmp.wunR4plUsY]#

    However, there is an issue with how the file is sent to Google, or how it is being returned as per the script, because I only get the result from the line below.

    if [ -z "$FILTERED" ]
    then
    echo "(Google was unable to recognize any speech in audio data.)" >> stream.new

    What am I missing that it would work if I manually send it, and not if the script is doing its thing? Thanks in advance.

    • This reply was modified 3 months, 3 weeks ago by kbohannon.
    Attachments:
    You must be logged in to view attached files.
    0
    kbohannon
    Participant
    Up
    0
    Down
    Resolved, finally. It worked after I commented out the line

    export CLOUDSDK_CONFIG=/home/asterisk/.config/gcloud

    So obviously I made a mistake in the initial configuration. Glad I can finally see how much better Google STT looks, though. No more %HESITATION% tags!

    0
Viewing 12 replies - 1 through 12 (of 12 total)
  • You must be logged in to reply to this topic.