Introducing agent-qa: An Open-Source Agentic QA Harness with Execution Memory

The traditional approach to automated quality assurance (QA) often involves a fragile cycle of writing rigid scripts, dealing with brittle selectors, and spending countless hours fixing tests that break the moment a UI element shifts by a few pixels. As applications grow in complexity, the maintenance overhead of these tests frequently outweighs their value.

agent-qa by Vostride aims to solve this by introducing an agentic approach to testing. By combining natural language test definitions with a sophisticated execution memory system, it transforms QA from a static scripting exercise into a dynamic, self-healing process that evolves alongside the product.

Natural Language Test Authoring

At the core of agent-qa is the ability to define tests in human language. Instead of writing complex Playwright or Selenium scripts, developers can author tests in YAML using plain English. For example, a test to verify issue creation in Linear might look like this:

Click on the Create issue icon.
Verify that the Create issue modal is shown.
Enter "Fix mobile login" in the "Issue title" input field.
Select "Engineering" from the Team selector.
Click on the Create issue button.

Because the agent works from visible roles, labels, and the current screen state, it can interpret these instructions dynamically. This abstracts away the need for hard-coded CSS selectors or XPaths, making tests significantly more resilient to minor UI changes.

The Power of Execution Memory

One of the most innovative features of agent-qa is its Execution Memory. Most AI-driven testing tools treat every run as a blank slate, forcing the agent to rediscover the navigation model and UI patterns every time. agent-qa instead builds a knowledge base from product, suite, and test observations.

How Memory Works

As the agent runs tests, it curates "contracts" and observations. For instance, it might learn that "Sidebar groups stay visible after switching between Docs and Projects." This memory is then injected into future runs, allowing the agent to:

Avoid Redundancy: Skip the process of rediscovering how to navigate the workspace.
Improve Accuracy: Use confirmed patterns (e.g., knowing that a command palette requires an exact title search) to avoid common pitfalls.
Reduce Failures: Recognize that certain elements (like a page toolbar) only appear after a specific action, preventing the agent from flagging a hidden element as a failure.

Self-Healing and Performance Optimization

To combat the "flakiness" that plagues automated testing, agent-qa implements Self-Healing Execution. If a sub-action—such as a click or a fill—fails, the agent doesn't immediately fail the test. Instead, it re-observes the UI and attempts to find an alternative path to achieve the same goal within the same run.

To ensure these agentic capabilities don't lead to prohibitive latency or cost, the platform includes a Smart Cache. By reusing validated action plans across similar subsequent runs, agent-qa can reduce planner work and token usage, reportedly accelerating execution speeds by up to 5x (e.g., reducing a 42s run to 8s).

Developer-Centric Infrastructure

Despite being powered by LLMs, agent-qa is designed to fit into a professional software development lifecycle (SDLC):

Version Controlled: Tests, configurations, and hooks are stored as code, allowing them to be diffed and reviewed via Pull Requests.
Sandboxed Hooks: Users can run Node, Bun, Python, or Bash hooks in isolated Docker containers to seed fixtures, call APIs, or tear down state.
LLM Agnostic: The system supports a wide array of providers, including OpenAI, Anthropic, Gemini, and local models via Ollama or LM Studio.
Machine Readable: Through MCP (Model Context Protocol) and specific skills, other coding agents can discover schemas and triage failures automatically.

Conclusion

By shifting the focus from how to interact with the UI to what the desired outcome is, agent-qa reduces the friction of maintaining a comprehensive test suite. While the current focus is heavily weighted toward frontend and mobile interactions, the integration of execution memory and self-healing mechanisms provides a blueprint for more resilient, autonomous quality assurance.

Introducing agent-qa: An Open-Source Agentic QA Harness with Execution Memory

Introducing agent-qa: An Open-Source Agentic QA Harness with Execution Memory

Natural Language Test Authoring

The Power of Execution Memory

How Memory Works

Self-Healing and Performance Optimization

Developer-Centric Infrastructure

Conclusion

References

HN Stories