Ep. 259 - June 9, 2024

https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/87/8b/1e/878b1e67-fd1a-fb2f-de5b-113fe4018dc7/mza_11173054665888442467.jpg/600x600bb.jpg

TechcraftingAI NLP

Brad Edwards

271 episodes

1 day ago

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Technology

RSS

All content for TechcraftingAI NLP is the property of Brad Edwards and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39368654/39368654-1703088924475-7aa75231d6474.jpg

Ep. 259 - June 9, 2024

TechcraftingAI NLP

37 minutes 33 seconds

1 year ago

Ep. 259 - June 9, 2024

ArXiv NLP research for Sunday, June 09, 2024.

00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations

06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models

08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation

11:20: QGEval: A Benchmark for Question Generation Evaluation

12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model

13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation

18:14: Hidden Holes: topological aspects of language models

19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper

20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models

22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering

23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain

26:27: Are Large Language Models Actually Good at Text Style Transfer?

27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator

28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction

30:12: Why Don't Prompt-Based Fairness Metrics Correlate?

31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

33:12: Semisupervised Neural Proto-Language Reconstruction

34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization

35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training

36:07: ThaiCoref: Thai Coreference Resolution Dataset