「音声対話」最新の音声対話技術について #3-5

https://is1-ssl.mzstatic.com/image/thumb/PodcastSource221/v4/13/55/40/135540c3-25c1-a603-a749-41897a23e5c9/7fcb8276-9e58-4b68-bb35-497a323aa6dc.jpg/600x600bb.jpg

AI Shift Academy

株式会社AI Shift

30 episodes

6 days ago

サイバーエージェントグループ・株式会社AI Shiftが提供する、AI技術の進化をストーリーとして読み解く、AI教養ポッドキャストです。 ▼おたよりフォームご意見・ご感想は下記よりお送りください。 https://forms.gle/djeA4bbMgVkJMdK79 ▼各種リンク AI Shiftホームページ：https://www.ai-shift.co.jp/ AI Shift Xアカウント：https://x.com/AIShift_PR 及川(パーソナリティ)：https://x.com/cyber_oikawa

Technology

RSS

All content for AI Shift Academy is the property of 株式会社AI Shift and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44191611/44191611-1754990846663-dbda56dd244bf.jpg

「音声対話」最新の音声対話技術について #3-5

AI Shift Academy

24 minutes 2 seconds

1 month ago

「音声対話」最新の音声対話技術について #3-5

AI Shift Academy（#シフアカ）

パイプラインからE2Eへ！最新音声対話技術の現在地

人間のように自然で低遅延な対話を実現する「E2E型音声モデル」へのパラダイムシフトを徹底解説します。

・ Half-duplex vs Full-duplex

爆速応答の「LLaMA-Omni」と、話しながら聞く同時双方向を実現する「Moshi」。それぞれのアーキテクチャと訓練手法の違いとは？

・ 技術の裏側

OpenAI Realtime APIの「擬似」Full-duplexの仕組みや、音声コーデック「Mimi」によるトークン化（Semantic/Acoustic）の構造を深掘り。

・ 課題と展望

対話データの不足やセキュリティ評価、今後のマルチモーダル化について議論します。

音声AIの最前線をキャッチアップしたい方は必聴です！

▼おたよりは⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠こちら⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠から