> For the complete documentation index, see [llms.txt](https://academy.gooey.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://academy.gooey.ai/ai-for-impact/module-8.md).

# Your first evals!

### What is a Golden Q\&A?

A Golden Q\&A is a list of common questions and accurate answers, created by experts on your team. These are used to test your AI Agent's performance.&#x20;

### Bulk Evaluation Process (Overview)

* The Golden Q\&A sheet is used as the test set in the [bulk evaluator](https://gooey.ai/bulk).
* For each question, the AI Agent generates an answer.
* The evaluator compares the AI Agent answer to the expert (golden) answer.
* Scores are based on technical accuracy, citation correctness, and answer quality.

<figure><img src="/files/nxLXhr3UnYBoqywrHwuY" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/8vZsDJnsPutvSYD8XMnb" alt=""><figcaption></figcaption></figure>

## Why is bulk run and evaluation important?

Bulk runs and evaluations help you with:

* Choosing the right&#x20;
  * LLM
  * TTS
  * STT
  * Translations
* Improving your overall AI Agent's responses
* Assess time vs cost for the choice of the pipeline
* <mark style="background-color:green;">Check regressions regularly</mark>

### Why do you need a bulk runner and evaluations? <a href="#id-4zynvpxsa8kj" id="id-4zynvpxsa8kj"></a>

When building your Gooey.AI workflows, you will have to tweak the settings often to ensure the responses show parity and are grounded and verifiable.

**There are several components to test:**

* testing prompts
* ensuring the synthetic data retrieval works
* checking the suitability of the language model and its advanced settings
* Latency of generated answers
* evaluation of the final AI Agent to produce the Golden Answers
* evaluation of the price per run
* regression tests

How can you do this at scale?

**This is where Gooey.AI’s Bulk and Evaluation features shine!**

### Features of Bulk Runner and Evaluation <a href="#eheq9i411cm3" id="eheq9i411cm3"></a>

* Run several models in one click
* Run several iterations of your workflows at scale
* Choose any of the API Response Outputs to populate your test
* Get output in CSV for further data analysis
* Built-in evaluation tool for quick analysis
* Use CSV or Google Sheets as input

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Common terms in bulk and evaluation</strong></td><td><a href="https://docs.gooey.ai/guides/understanding-bulk-runner-and-evaluation#id-3yvzoyislzdo">https://docs.gooey.ai/guides/understanding-bulk-runner-and-evaluation#id-3yvzoyislzdo</a></td></tr></tbody></table>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://academy.gooey.ai/ai-for-impact/module-8.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
