Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Abstract: Large Language Models (LLMs) have grown in popularity in recent years and are now employed in a variety of software engineering domains thanks to their Natural Language Processing (NLP) ...
Litmus is a comprehensive tool designed for testing and evaluating HTTP Requests and Responses, especially for Large Language Models (LLMs). It combines a powerful API, a robust worker service, a user ...