2. 인공지능 논문리뷰/Agent AI

[논문추천] Can Large Language Models be Trusted for Evaluation?Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

First man 2025. 3. 13. 04:09
728x90
반응형
SMALL

 

안녕하세요,

IITP 토론토대학교 인공지능 파견 간 기업프로젝트도 함께 진행하고 있는데요.

 

오늘은 진행하고 있는 LG Toronto Agent AI Project에서 담당자분이 추천해주신 논문 공유드립니다.

https://arxiv.org/abs/2401.16788

 

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate

Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs to assess respo

arxiv.org

Submission history

Tue, 30 Jan 2024 07:03:32 UTC 

 

 

728x90
반응형
LIST