We ran a 6-week experiment across our team — half on Cursor, half on Copilot Enterprise. Measured by PR output, self-reported friction, and code review comments. Cursor group: +22% lines of non-test code per week, -15% review comments per PR (i.e. higher quality first drafts), higher satisfaction scores. Copilot group: +11% lines of code, roughly flat review comments, moderate satisfaction. Caveats: small sample, different skill levels, Cursor is more expensive. The Cursor lead might also be a novelty effect — we'll rerun in 6 months. The qualitative finding that surprised us: the Cursor team reported spending more time on architecture decisions and less time on boilerplate. Whether that's cause or effect of better output, we don't know.
Comments
Loading comments…