Launching today
TryCase
Disposable test environments for AI coding agents
160 followers
Disposable test environments for AI coding agents
160 followers
TryCase gives AI coding agents disposable Linux environments to run apps, test changes end to end, capture screenshots and recordings, and return verified code instead of asking you to test manually.




Hey Product Hunt,
I’m Ben, and I’m building TryCase.
This came from my own workflow. I’ll often have a bunch of agents running at once across different worktrees, each trying changes, spinning up the app, testing, iterating, taking screenshots, or recording proof.
That gets messy quickly. My laptop becomes the bottleneck, ports collide, installs overlap, browser sessions get reused in weird ways, and I still end up doing a lot of the final verification myself.
So I built TryCase to give each agent its own disposable Linux environment in the cloud. The agent can run the app, test the change end to end, capture screenshots or video, and come back with proof instead of just code.
It’s also useful for longer-running tasks. You can give an agent a goal, let it work inside a clean disposable environment, and ask it to come back with screenshots, logs, and recordings. Secrets can be passed in deliberately, and each run is isolated from your laptop and from other agents.
TryCase is still early, but the goal is simple: agents should only say “done” once they’ve actually run and verified the work.
It’s easy to try. Just ask your coding agent to use TryCase at trycase.dev:
- Fix this bug, test it end to end with TryCase, iterate until it works, and send me screenshots and a video recording as proof.
- Implement this feature, run the app in TryCase, iterate on any failures, and prove it’s working with screenshots, logs, and a recording.
- Use TryCase to run this repo in a clean environment, verify the main flow, and show me what the app looks like.
- Test this branch in TryCase, find anything that breaks, fix it, and prove the final version works.
- If manual login is needed, use desktop mode and give me the take-control link.
I’d love feedback from people using Codex, Claude Code, Cursor, or other coding agents. What would make you trust an agent’s “done” more?
"agents should only say done once they've actually run and verified the work" is exactly right and it's the thing i keep running into with my own multi-agent setups - an agent will confidently report success based on its own transcript when the thing it was supposed to do never actually happened underneath. running multiple worktrees locally does turn into port collisions and reused browser sessions pretty fast like you said. does the screenshot/recording proof get attached anywhere the agent itself can't fake or reword, or is it still on me to actually look at the video rather than trust the agent's summary of it?
@omri_ben_shoham1 Screenshots and recordings can be downloaded from the CLI, but someone still needs to verify them. In my workflow, I usually have an agent handle the verification, and then I quickly skim through the recording it surfaces.
So far, I've found GPT-5.5 to be pretty reliable for this.
I'm curious what your ideal workflow would look like. Right now, Trycase is intentionally minimal and only exposes tools that agents can use. It doesn't have its own built-in agent yet, so it relies on whatever agent you're using.
@ben_chomsang Could you please add the recording and screenshot review process to the AI? The AI should verify whether they are correct, and if it finds any issues, it should retry the task automatically.
In Agent AI, there are a large number of screenshots and recordings, so manually reviewing each one is very time-consuming. Adding this capability would make your product much more useful and efficient than it is now.
the 'done only after it actually ran and verified' bar is the right one. do you surface the failed attempts too, or just the final passing proof?
@andrewzakonov Right now, this depends on the agent deciding what to keep track of. For example, if it's instructed to record failed attempts, you'll be able to review the full history of runs and their artifacts. My goal is to keep the tool flexible by providing a small set of focused commands that the agent can use as needed. I'm curious if you think this is the right approach, or if you'd prefer a single tool that always behaves the same way and handles everything for you?
The 'an agent handles verification' step is where I've watched this quietly break. When the same model family does the work and the check, the verifier tends to trust the doer's framing of what success looks like, so it happily confirms a screenshot of the wrong screen. What helped me was feeding the verification agent only the original task spec plus the artifact, never the doer's transcript, so it can't inherit the optimistic story of what happened. Does TryCase hand the checker the full run log, or just the recording and a fresh prompt?
Screenshots and recordings as verification artifacts are useful for UI changes, but for backend logic or API behavior the visual output doesn't tell you much about whether the code actually works correctly. What does TryCase return as verification evidence for non-visual changes, like a database write, a webhook handler, or a background job, and how does the agent know the difference between "it ran without error" and "it did the right thing"?
Such a smart concept for vibe coding. Testing AI-generated code safely without messing up my local environment is always a massive concern. Can these disposable environments be spun up locally, or is the platform entirely cloud-based?
congrats on the launch ben. the port collision thing you describe hit me the first time i let two agents run dev servers at once, so the throwaway box approach makes a lot of sense. one thing i'm curious about: booting an app end to end usually means real env vars, api keys, db urls. where do those secrets live while a sandbox runs, and does teardown wipe them for good?