IT|軟體|應用|語音辨識|即時說話口譯 DIY-2 搭配 RESTFul API 應用

 
 
[接線]
3.3(必須 3.3v)
3.3
11(BCM 17)
電阻
3(GND)
接地
rec.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
import RPi.GPIO as GPIO
import pyaudio
import wave
import os
import sys
 
def rec_fun():
    # 隱藏錯誤消息,因為會有一堆ALSA和JACK錯誤消息,但其實能正常錄音
    os.close(sys.stderr.fileno())
   
    BUTT = 17    # 開始錄音的按鈕:一邊接GPIO17,一邊接地
    GPIO.setmode(GPIO.BCM)
    # 設GPIO26腳為輸入腳,電平拉高,也就是說26腳一旦讀到低電平,說明按了按鈕
    GPIO.setup(BUTT, GPIO.IN, pull_up_down = GPIO.PUD_UP)
 
    # wav文件是由若干個CHUNK組成的,CHUNK我們就理解成數據包或者數據片段。
    CHUNK = 512
    FORMAT = pyaudio.paInt16  # pyaudio.paInt16表示我們使用量化位數16位來進行錄音
    RATE = 44100  # 採樣率44.1k,每秒採樣44100個點。
    WAVE_OUTPUT_FILENAME = "command.wav"
    print('請按住按鈕開始錄音...')
    GPIO.wait_for_edge(BUTT, GPIO.FALLING)
 
    # To use PyAudio, first instantiate PyAudio using pyaudio.PyAudio(), which sets up the portaudio system.
    p = pyaudio.PyAudio()
    stream = p.open(format = FORMAT,
                    channels = 1,    # cloud speecAPI只支持單聲道
                    rate = RATE,
                    input = True,
                    frames_per_buffer = CHUNK)
    print("錄音中...")
 
    frames = []
    # 按住按鈕錄音,放開時結束
    while GPIO.input(BUTT) == 0:
        data = stream.read(CHUNK)
        frames.append(data)
    print("錄音完成,輸出文件:" + WAVE_OUTPUT_FILENAME + '\n')
    stream.stop_stream()
    stream.close()
    p.terminate()
 
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(FORMAT))    # Returns the size (in bytes) for the specified sample format.
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
   
    return
 
# 可以直接運行rec.py進行測試,同時保證該文件import時不會自動運行
if __name__ == '__main__':
    rec_fun()
 
執行按鈕錄音程式
python3 rec.py
 
播放
aplay command.wav
 
 
配合 API
 
apirec.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
 
from flask import Flask, jsonify,make_response,request,abort
from flask.ext.httpauth import HTTPBasicAuth
 
 
import time
import os
from os import path
import speech_recognition as sr
from textblob import TextBlob
from gtts import gTTS
 
app = Flask(__name__, static_url_path="")
auth = HTTPBasicAuth()
 
fromLanguage = "zh-TW"
toLanguage = "en"
 
 
def text_speaker(content):
    tts = gTTS(text=content, lang=fromLanguage)
    tts.save("tts.mp3")
    os.system('omxplayer -p -o local tts.mp3')
    #time.sleep(0.5)
    return None
 
text_speaker("您好,我是語音辨識助理,啟動服務中...")
 
@auth.get_password
def get_password(username):
    if username == 'pi':
        return '999999'
    return None
 
@app.errorhandler(400)
def not_found(error):
    return make_response(jsonify({'error': 'Bad request'}), 400)
 
@auth.error_handler
def unauthorized():
    return make_response(jsonify({'error': 'Unauthorized'}), 403)
    # return 403 instead of 401 to prevent browsers from displaying the default auth dialog
 
@app.errorhandler(404)
def not_found(error):
    return make_response(jsonify({'error': 'Not found'}), 404)
 
@app.route('/todo/api/v1.0/tasks', methods=['POST'])
@auth.login_required
def create_task():
    if not request.json or not 'content' in request.json:
        abort(400)
    task = {
        'content': request.json['content'],
        'status': 'Created'
    }
    text_speaker(request.json['content'])
    return jsonify(task), 201
 
 
if __name__ == '__main__':
    text_speaker("服務啟動成功,語音辨識助理開始為您服務。")
    app.run(host='0.0.0.0', port=5000,debug=False)
 
 
text_speaker("結束語音辨識助理服務,謝謝您的使用。")
 
 
[參考]

IT|軟體|應用|語音辨識|即時說話口譯 DIY-1

[材料準備]
 
1. 樹莓派主板 x 1(Raspberry Pi 3 Model B)
2. 揚聲器0.5w x 1
3. 3.5”音源線  x 1
4. 3.5”麥克風 x 1
6.音效卡
 
因為樹莓派沒有內建mic輸入,所以要使用 STT 功能,首先要先在樹莓派上裝一支麥克風,有兩種方式可以替樹莓派裝上麥克風,相關麥克風設定,請參考以下記事鏈結。–> IoT|硬體|樹莓派|外接麥克風及喇叭設置
 
[安裝軟體]
 
SpeechRecognition:它將一些常用的speech recognition系統整合在一起,方便我們使用,可省下相當多的開發時間。該模組整合了下列知名的STT API:
1. CMU Sphinx (works offline)
2. Google Speech Recognition
3. Wit.ai
4. Microsoft Bing Voice Recognition
5. api.ai
6. IBM Speech to Text
 
安裝 Python3 SpeechRecognition 模組
sudo pip3 install SpeechRecognition
 
TextBlob:這是一套強大的文本分析工具,提供了方便的工具進行詞性分類、名詞短語截取、情感分析、文本分類、拼寫檢查、翻譯和語言檢測…等等,未來我們再針對此模組作深入的使用,在這裏我們將僅使用它的翻譯功能(透過Google API)
安裝 TextBlob
 
安裝 TextBlob
sudo pip3 install -U textblob
 
下載必要的語料庫,若自行指定其它的語料庫,可更改NLTK_DATA環境變數
sudo python3 -m textblob.download_corpora
 
gTTS(Google Text to Speech):這個模組提供一個方便使用Google’s Text to Speech API的介面。在這裏我們並沒有使用工研院而改用Google的TTS原因是,工研院 TTS的處理效率實在太慢且又不穩定了,雖然音質較好也較自然,但是考慮到即時翻譯的流暢性還是優先考慮Google TTS,不過未來我們也可以將工研院TTS功能加到程式中,讓使用者自行切換選擇。
 
安裝 gTTS
sudo pip3 install gTTS
 
PyAudio:gTTS會將文字轉出的語音存成mp3檔,我們使用pyAudio來播放此mp3檔
sudo apt-get install libasound-dev
sudo apt-get install python-pyaudio python3-pyaudio
 
將中文語音轉為文字內容後進行英文轉譯,並透過喇叭輸出電腦合成音效。
python3 translatedSpeech.py
 
translatedSpeech.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

fromLanguage = "zh-TW"
toLanguage = "en"

import time

import os
from os import path

import speech_recognition as sr
from textblob import TextBlob

from gtts import gTTS

while True:

  try:
    r = sr.Recognizer()
    with sr.Microphone() as source:
       print("Say something!")
       audio = r.listen(source)

    # sttTXT_org = r.recognize_google(audio, key="AIzaSyDMjV3fPEyVyQ6CGv6hZ-5Ndn9vCn-2NtI", language$
    sttTXT_org = r.recognize_google(audio, language=fromLanguage)
    print("Google Speech Recognition thinks you said: " + sttTXT_org)

    sttTXT_tblob = TextBlob(sttTXT_org)

    blobTranslated = sttTXT_tblob.translate(to=toLanguage)
    print("Translated --> " + blobTranslated.raw)

    tts = gTTS(text="您剛剛說的語句是." +sttTXT_org+"英文翻譯是.", lang=fromLanguage)
    tts.save("tts.mp3")
    os.system('omxplayer -p -o local tts.mp3')
    time.sleep(0.5)

    tts = gTTS(blobTranslated.raw + ". ", lang=toLanguage)
    tts.save("tts.mp3")
    os.system('omxplayer -p -o local tts.mp3')
    time.sleep(0.5)


  except sr.UnknownValueError:
       print("Google Speech Recognition could not understand audio")
       tts = gTTS(text="您剛剛說的我沒聽懂,麻煩您再說一次", lang=fromLanguage)
       tts.save("tts.mp3")
       os.system('omxplayer -p -o local tts.mp3')
       time.sleep(0.5)

  except sr.RequestError as e:
       print("Could not request results from Google Speech Recognition service; {0}".format(e))
       time.sleep(0.5)
 
測試範例
你好
 
將中文語音轉為文字內容,並透過喇叭輸出電腦合成音效。
python3 textSpeech.py
 
textSpeech.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

fromLanguage = "zh-TW"
toLanguage = "en"

import time

import os
from os import path

import speech_recognition as sr
from textblob import TextBlob

from gtts import gTTS

while True:

  try:
    r = sr.Recognizer()
    with sr.Microphone() as source:
       print("Say something!")
       audio = r.listen(source)

    # sttTXT_org = r.recognize_google(audio, key="AIzaSyDMjV3fPEyVyQ6CGv6hZ-5Ndn9vCn-2NtI", language$
    sttTXT_org = r.recognize_google(audio, language=fromLanguage)
    print("Google Speech Recognition thinks you said --> " + sttTXT_org)

    tts = gTTS(text="您剛剛說的語句是." +sttTXT_org, lang=fromLanguage)
    tts.save("tts.mp3")
    os.system('omxplayer -p -o local tts.mp3')
    time.sleep(0.5)

  except sr.UnknownValueError:
       print("Google Speech Recognition could not understand audio")
       tts = gTTS(text="您剛剛說的我沒聽懂,麻煩您再說一次", lang=fromLanguage)
       tts.save("tts.mp3")
       os.system('omxplayer -p -o local tts.mp3')
       time.sleep(0.5)

  except sr.RequestError as e:
       print("Could not request results from Google Speech Recognition service; {0}".format(e))
       time.sleep(0.5)
 
測試範例
數數 1 2 3 4
 
播放設備
os.system('omxplayer -p -o local tts.mp3’)
—>-o  --adev  device          Audio out device      : e.g. hdmi/local/both/alsa[:device
 
[參考]