Yubimozi HoloLensの音声入力　AzureSpeech to Text

本日はAzure枠です。

先日OsakaHoloLens Hackathonで開発したYubimoziHoloLensではキーボード入力のほかに音声で入力を行うこともできます。

youtu.be

今回はこの実装に関して紹介します。（といってもイベント最後の１時間で詰め込み実装したものなので最適な使い方ではないです。）

〇Azure Speech to Text

Azure Speech to TextはMicrosoftのクラウドサービスAzureで提供されているサービスの一つです。

音声を認識し、テキストに変換するという機能を提供しています。

今回はMicrosoft Learnで提供されている情報をもとに実装しています。

docs.microsoft.com

〇Azureリソースの作成

①Azureポータルを開きます。(https://portal.azure.com/#home)

f:id:Holomoto-Sumire:20220223211340p:plain

②[リソースの作成]から検索欄で音声を入力し音声のリソースを作成します。

f:id:Holomoto-Sumire:20220223211502p:plain

③新規のリソースグループを作成しリージョンは西米国、価格はフリープランがありましたのでフリーを選択しています。

f:id:Holomoto-Sumire:20220223211838p:plain

④リソースが作成されました。

f:id:Holomoto-Sumire:20220223212026p:plain

〇アプリケーション内での実装

Unityないで使用するためのSDKが提供されています。

SDKが以下で公開されています。

https://aka.ms/csspeech/unitypackage/

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Translation;
using Microsoft.MixedReality.Toolkit.UI;
using TMPro;

public class Translatio : MonoBehaviour
{
 public TextMeshPro recognizedText;
    public TextMeshPro translatedText;
    public PressableButton micButton;

    public string SpeechServiceSubscriptionKey = "";
    public string SpeechServiceRegion = "";

    private bool waitingforReco;
    private string recognizedString;
    private string translatedString;

    private bool micPermissionGranted = false;

    private object threadLocker = new object();

    public async void ButtonClick()
    {
        var translationConfig = SpeechTranslationConfig.FromSubscription(SpeechServiceSubscriptionKey, SpeechServiceRegion);
        translationConfig.SpeechRecognitionLanguage = "ja-JP";
        translationConfig.AddTargetLanguage("fr");//サンプルコードを流用しているため翻訳対象など指定していますが、今回は使用していません。

        using (var recognizer = new TranslationRecognizer(translationConfig))
        {
            lock (threadLocker)
            {
                waitingforReco = true;
            }

            var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

            if (result.Reason == ResultReason.TranslatedSpeech)
            {
                recognizedString = result.Text;
                foreach (var element in result.Translations)
                {
                    translatedString = element.Value;
                }
            }
            else if (result.Reason == ResultReason.NoMatch)
            {
                recognizedString = "NOMATCH: Speech could not be recognized.";
            }
            else if (result.Reason == ResultReason.Canceled)
            {
                var cancellation = CancellationDetails.FromResult(result);
                recognizedString = $"CANCELED: Reason={cancellation.Reason} ErrorDetails={cancellation.ErrorDetails}";
            }
            lock (threadLocker)
            {
                waitingforReco = false;
            }
        }
    }


    // Start is called before the first frame update
    void Start()
    {
        if (translatedText == null)
        {
            UnityEngine.Debug.LogError("translatedText property is null! Assign a UI TextMeshPro Text element to it.");
        }
        else if (micButton == null)
        {
            UnityEngine.Debug.LogError("micButton property is null! Assign a MRTK Pressable Button to it.");
        }
        else
        {
            micPermissionGranted = true;
            micButton.ButtonPressed.AddListener(ButtonClick);
        }
    }

    // Update is called once per frame
    void Update()
    {
        lock (threadLocker)
        {
            recognizedText.text = recognizedString;
            translatedText.text = translatedString;
        }
    }
}

これはAzureSpeech to Textを使用してフランス語に変換するスクリプトを流用しています。

重要な個所は以下になります

        var translationConfig = SpeechTranslationConfig.FromSubscription(SpeechServiceSubscriptionKey, SpeechServiceRegion);
        translationConfig.SpeechRecognitionLanguage = "ja-JP";

translationConfig.SpeechRecognitionLanguageを日本語に設定しています。　これによって日本語を対象とした認識が行われます。

これによってrecognizedTextに認識した文字が表示されるようになります。

以上で突貫工事ですが音声認識でのインプットを実装しています。