LMArena

Arena
網站類型	人工智慧
成立	2025年3月
原產地	美國
創立者	江韋霖（Wei-Lin Chiang）; Anastasios N. Angelopoulos; Ion Stoica;
網址	arena.ai
註冊	無需註冊
推出時間	2023年5月3日，2年前

Arena （原名Chatbot Arena、LMArena）是一個公開的線上聊天機器人平台及排名，其透過匿名、群眾參與的成對比較來評估大型語言模型。

使用時，使用者輸入提示，由兩個匿名模型生成回覆，再投票選出表現較好的模型，模型名稱會在投票後才顯示。使用者也能自行挑選模型進行測試。^[1]^[2]

Arena在人工智慧領域中具有影響力，許多大型公司會在平台上提供自家語言模型，例如GPT、Gemini ^[3]、Claude^[4] 、Mistral、Grok、深度求索、Kimi，利用平台上的排名推廣產品，以及使用用戶在網站上的對話訓練模型。該網站也會被用於測試尚未正式公開的模型版本。

例如，中國公司DeepSeek在R1模型受到西方媒體關注前的數月，便已在Arena上測試其原型模型。^[5]其他在平台進行預先測試的案例包括：OpenAI以「summit」為代號測試GPT-5的變體，以及Google DeepMind以「nano-banana」為代號測試Gemini-2.5-Flash-Image。^[6] ^[7]

不過，Arena的評估方法也成為學術界分析的對象，研究指出其存在局限，並提出改進建議。平台隨後依據相關研究持續更新政策與方法論。^[8]^[9]

參考資料

^ Hart, Robert. What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes. Forbes. 2024-07-18 [2025-04-21].
^ Kruppa, Miles. The UC Berkeley Project That Is the AI Industry's Obsession. The Wall Street Journal. 2024-12-05 [2025-04-21].
^ Nuñez, Michael. Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story. VentureBeat. 2024-11-15 [2025-04-21].
^ Edwards, Benj. "The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time. Ars Technica. 2024-03-27 [2025-04-21].
^ Metz, Rachel. Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival. Bloomberg News. 2025-02-18 [2025-04-21].
^ Ziff, Maxwell. Google Gemini's AI image model gets a 'bananas' upgrade. TechCrunch. 2025-08-26 [2025-08-27].
^ Langley, Hugh. Is Google behind a mysterious new AI image generator? These bananas might confirm it. Business Insider. 2025-08-19 [2025-08-27].
^ Stokel-Walker, Chris. Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds. Fast Company. 2025-02-06 [2025-04-21].
^ Wiggers, Kyle. The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark. TechCrunch. 2024-09-05 [2025-04-21].

外部連結

官方網站

[1] Hart, Robert. What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes. Forbes. 2024-07-18 [2025-04-21].

[2] Kruppa, Miles. The UC Berkeley Project That Is the AI Industry's Obsession. The Wall Street Journal. 2024-12-05 [2025-04-21].

[3] Nuñez, Michael. Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story. VentureBeat. 2024-11-15 [2025-04-21].

[4] Edwards, Benj. "The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time. Ars Technica. 2024-03-27 [2025-04-21].

[5] Metz, Rachel. Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival. Bloomberg News. 2025-02-18 [2025-04-21].

[6] Ziff, Maxwell. Google Gemini's AI image model gets a 'bananas' upgrade. TechCrunch. 2025-08-26 [2025-08-27].

[7] Langley, Hugh. Is Google behind a mysterious new AI image generator? These bananas might confirm it. Business Insider. 2025-08-19 [2025-08-27].

[8] Stokel-Walker, Chris. Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds. Fast Company. 2025-02-06 [2025-04-21].

[9] Wiggers, Kyle. The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark. TechCrunch. 2024-09-05 [2025-04-21].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

閱論編生成式人工智慧聊天機器人
LMArena 聊天機器人 LLMs
Character.ai ChatGPT Claude Command（英語：Cohere） Copilot 深度求索 Brave Leo 文心一言 Gemini 智譜清言 Grok 騰訊混元 Kimi LLaMA Lumo Meta AI Mistral Perplexity Poe 通義千問 Velvet（英語：Velvet AI） You.com 港話通
分類