SuperNote Tuning

** UNDER CONSTRUCTION - with AI - so some stuff can be "off message" **

movie lines utility:
supernote25.com/movielines


for educational reference on AI -
deeplearning.ai





Using Sheets to run batches and compare agent revisions

To ensure a semantic UI effectively covers an application with multiple screens, it may require 10,000 or more questions to account for the diverse ways users (and other AI agents) will interact with it. Regression analysis, by grouping questions based on expected responses, offers a high-level overview of the AI agent's stability. Example spreadsheet with initial responses from questions resolved by SME (subject matter experts - people that can source the “truth”).

INSERT SPREADSHEET IMAGE

Initial Analysis: The first batch of results is assessed by humans to establish a baseline of correctness. This initial assessment serves as the "truth" against which subsequent builds are compared.

Dynamic Truth: However, the "truth" isn't always static. If a later build produces a response that is deemed more accurate, that response becomes the new benchmark for comparison.

Build-to-Build Comparison: The process focuses on identifying changes between builds. This is crucial because a single change can impact numerous, or even all, outputs. Analyzing such widespread effects can be challenging. Therefore, the "truth" for comparison is typically the results from the most recent build.

The Importance of Regression Analysis: Regression analysis between builds is essential. It allows us to incorporate insights into the next set of training instructions. Without this analysis, we risk introducing unexpected variance across a wide range of responses.

The Goal of Regression: The core purpose of regression is to monitor for variance and refine the system to consistently achieve 100% accuracy in matching expected results.

Coverage vs. Resolution: While the number of questions and tests is related to resolution, it's not necessary to retain all of them. Batches of tests may fluctuate between correct and incorrect status.

Focus on Edge Cases: Some questions will consistently produce fluctuating labels. These "edge cases" are the ones that vary between builds and require attention. They are critical to retain in your coverage.

Quality Over Quantity: While more coverage is generally beneficial, having good quality coverage of these edge cases is essential for achieving optimal results.


In summary: The regression process is a dynamic cycle of analysis, comparison, and refinement. It's about establishing a "truth", challenging it, and continuously refining the system to achieve the highest possible accuracy. The focus is on understanding the impact of changes, managing variance, and paying special attention to those tricky edge cases that often hold the key to unlocking further improvements.


Agile hybriding: Previously, regression required code changes for each new build and batch. Now, users can modify prompts directly within a spreadsheet and utilize a Sheets add-on for efficient batch runs and result imports. Cloud-based snapshots enable precise recreation without code access or proprietary tools. While the process benefits from matrixing techniques for scalability (tracing the fewest cases for maximum accuracy), it's essentially a brute force method with numerous optimization possibilities, all manageable within the spreadsheet by non-developers. This accessibility empowers entry-level professionals to contribute meaningfully while developing valuable skills.

Prompt Tuning Guide

Welcome to the prompt tuning process! Your contributions are valuable as you learn and grow with us.

Directions

PROMPT TUNING TASKS

Find your tasks and instructions on the Resolution Tasks list sheet. The queue is prioritized and should be addressed promptly.

This queue contains unexpected changes that need resolution after a new build and regression analysis.

Once the queue is empty, you can generate new coverage using the application. Please be thoughtful and efficient in creating new cases.

The queue is refreshed daily at 7 am Pacific time. Pay close attention to updates, especially during final closedown.

General Guidance, Tasks, and Plan

Unexpected Change Resolution Guidelines

Expected vs. Unexpected Changes

Changes are a normal part of the process, and we will provide notes on what to expect in each build.

Use this information to determine whether an output change is anticipated or unexpected based on the build changes.

Apply careful thought and patience. This process ensures control as we refine our dynamic engine and stabilize results.

If you find any errors in these instructions, please correct them.

Focus on understanding and executing the process as described.