Vicuna: A New Open-Source Chatbot Competing with GPT-4
In the world of large language models (LLMs), chatbot systems have seen significant advancements, with OpenAI’s ChatGPT being a prime example. However, the lack of clarity in ChatGPT’s training and architecture details has limited research and open-source innovation. Enter Vicuna-13B, an open-source chatbot inspired by the Meta LLaMA and Stanford Alpaca project, boasting an enhanced dataset and user-friendly, scalable infrastructure. By fine-tuning a LLaMA base model on user-shared conversations from ShareGPT.com, Vicuna-13B demonstrates competitive performance compared to other open-source models like Stanford Alpaca.
Challenges in Evaluating AI Chatbots
Evaluating AI chatbots is no easy feat, as it involves assessing language understanding, reasoning, and context awareness. As AI chatbots become more advanced, existing open benchmarks may no longer be sufficient. For example, the evaluation dataset used in Stanford’s Alpaca, self-instruct, can be effectively answered by SOTA chatbots, making it difficult for humans to discern performance differences. Other limitations include training/test data contamination and the high cost of creating new benchmarks. To address these issues, we propose an evaluation framework based on GPT-4 to automate chatbot performance assessment.
Model Details
Vicuna is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. It is an auto-regressive language model, based on the transformer architecture. Vicuna was trained between March 2023 and April 2023, with development led by a team from UC Berkeley, CMU, Stanford, and UC San Diego.
Intended Use and Users
The primary intended use of Vicuna is research on large language models and chatbots. Its primary intended users are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
More models
Here you can find more AI models: https://huggingface.co/models?other=llama&p=1&sort=downloads
and see a comparison and benchark of AI models: https://lmsys.org/blog/2023-05-03-arena/
Leave a Reply