# Your first evals!

### What is a Golden Q\&A?

A Golden Q\&A is a list of common questions and accurate answers, created by experts on your team. These are used to test your AI Agent's performance.&#x20;

### Bulk Evaluation Process (Overview)

* The Golden Q\&A sheet is used as the test set in the [bulk evaluator](https://gooey.ai/bulk).
* For each question, the AI Agent generates an answer.
* The evaluator compares the AI Agent answer to the expert (golden) answer.
* Scores are based on technical accuracy, citation correctness, and answer quality.

<figure><img src="https://2450152260-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNWqgWAjD0VVJgjYDpsN5%2Fuploads%2FnNu3lzK09kIU6i5ag9t9%2Fimage.png?alt=media&#x26;token=915b3b03-a449-4eee-b2de-3991ef6c4d91" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2450152260-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FNWqgWAjD0VVJgjYDpsN5%2Fuploads%2FgylgPi2vgS78n0i36fQ3%2Fimage.png?alt=media&#x26;token=24119a3a-388f-4ca0-9e67-58c37e2c320f" alt=""><figcaption></figcaption></figure>

## Why is bulk run and evaluation important?

Bulk runs and evaluations help you with:

* Choosing the right&#x20;
  * LLM
  * TTS
  * STT
  * Translations
* Improving your overall AI Agent's responses
* Assess time vs cost for the choice of the pipeline
* <mark style="background-color:green;">Check regressions regularly</mark>

### Why do you need a bulk runner and evaluations? <a href="#id-4zynvpxsa8kj" id="id-4zynvpxsa8kj"></a>

When building your Gooey.AI workflows, you will have to tweak the settings often to ensure the responses show parity and are grounded and verifiable.

**There are several components to test:**

* testing prompts
* ensuring the synthetic data retrieval works
* checking the suitability of the language model and its advanced settings
* Latency of generated answers
* evaluation of the final AI Agent to produce the Golden Answers
* evaluation of the price per run
* regression tests

How can you do this at scale?

**This is where Gooey.AI’s Bulk and Evaluation features shine!**

### Features of Bulk Runner and Evaluation <a href="#eheq9i411cm3" id="eheq9i411cm3"></a>

* Run several models in one click
* Run several iterations of your workflows at scale
* Choose any of the API Response Outputs to populate your test
* Get output in CSV for further data analysis
* Built-in evaluation tool for quick analysis
* Use CSV or Google Sheets as input

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><strong>Common terms in bulk and evaluation</strong></td><td><a href="https://docs.gooey.ai/guides/understanding-bulk-runner-and-evaluation#id-3yvzoyislzdo">https://docs.gooey.ai/guides/understanding-bulk-runner-and-evaluation#id-3yvzoyislzdo</a></td></tr></tbody></table>
