G-EVAL: NLG Evaluation using GPT-4 with Better Human Alignment
AI/기술,논문 리뷰
https://arxiv.org/abs/2303.16634 G-Eval: NLG Evaluation using GPT-4 with Better Human AlignmentThe quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especiallyarxiv.org 논문이 작성되게 된 배경자연어 생성(NLG) 시스템의 평가 어려움:..