Close Menu
OnlyPlanz –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Pinterest Announces Ad Updates at ‘Pinterest Presents’ Event

    September 26, 2025

    NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers

    September 26, 2025

    FEELWORLD P6XL On-Camera Field Monitor Released – Built-In Battery, Compact Size

    September 26, 2025
    Facebook X (Twitter) Instagram
    Trending
    • Pinterest Announces Ad Updates at ‘Pinterest Presents’ Event
    • NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers
    • FEELWORLD P6XL On-Camera Field Monitor Released – Built-In Battery, Compact Size
    • Google is still 210x bigger than ChatGPT in search
    • Navratri 2025: 5 sweat-proof, long-lasting makeup tips for garba nights and pandal hopping
    • Martin Lewis on one big misunderstanding about student loans
    • The Photoshop Workflow That Makes Portraits Instantly Cleaner
    • Co-Creating With Athletes: The Next Brand Advantage
    Facebook X (Twitter) Instagram Pinterest Vimeo
    OnlyPlanz –OnlyPlanz –
    • Home
    • Marketing
    • Branding
    • Modeling
    • Video Creation
    • Editing Tips
    • Content
    • Engagement
    • More
      • Tools
      • Earnings
      • Legal
      • Monetization
    OnlyPlanz –
    Home»Monetization»A new AI coding challenge just published its first results – and they aren’t pretty
    Monetization

    A new AI coding challenge just published its first results – and they aren’t pretty

    onlyplanz_80y6mtBy onlyplanz_80y6mtJuly 24, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Blue code on a dark background presented at an angle.
    Share
    Facebook Twitter LinkedIn Pinterest Email

    A brand new AI coding problem has revealed its first winner — and set a brand new bar for AI-powered software program engineers. 

    On Wednesday at 5pm PST, the nonprofit Laude Institute introduced the primary winner of the Okay Prize, a multi-round AI coding problem launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian immediate engineer named Eduardo Rocha de Andrade, who will obtain $50,000 for the prize. However extra stunning than the win was his last rating: he received with appropriate solutions to simply 7.5% of the questions on the take a look at.

    “We’re glad we constructed a benchmark that’s really exhausting,” stated Konwinski. “Benchmarks needs to be exhausting in the event that they’re going to matter,” he continued, including: “Scores could be totally different if the large labs had entered with their greatest fashions. However that’s type of the purpose. Okay Prize runs offline with restricted compute, so it favors smaller and open fashions. I like that. It ranges the enjoying subject.”

    Konwinski has pledged $1 million to the primary open-source mannequin that may rating greater than 90% on the take a look at.

    Much like the well-known SWE-Bench system, the Okay Prize exams fashions in opposition to flagged points from GitHub as a take a look at of how effectively fashions can take care of real-world programming issues. However whereas SWE-Bench relies on a set set of issues that fashions can practice in opposition to, the Okay Prize is designed as a “contamination-free model of SWE-Bench,” utilizing a timed entry system to protect in opposition to any benchmark-specific coaching. For spherical one, fashions have been due by March twelfth. The Okay Prize organizers then constructed the take a look at utilizing solely GitHub points flagged after that date.

    The 7.5% prime rating stands in marked distinction to SWE-Bench itself, which at the moment exhibits a 75% prime rating on its simpler ‘Verified’ take a look at and 34% on its more durable ‘Full’ take a look at. Konwinski nonetheless isn’t positive whether or not the disparity is because of contamination on SWE-Bench or simply the problem of amassing new points from GitHub, however he expects the Okay Prize challenge to reply the query quickly.

    “As we get extra runs of the factor, we’ll have a greater sense,” he informed TechCrunch, “as a result of we count on folks to adapt to the dynamics of competing on this each few months.”

    Techcrunch occasion

    San Francisco
    |
    October 27-29, 2025

    It’d seem to be an odd place to fall quick, given the big selection of AI coding instruments already publicly out there – however with benchmarks turning into too straightforward, many critics see initiatives just like the Okay Prize as a needed step towards fixing AI’s rising analysis drawback.

    “I’m fairly bullish about constructing new exams for present benchmarks,” says Princeton researcher Sayash Kapoor, who put ahead the same concept in a latest paper. “With out such experiments, we are able to’t really inform if the problem is contamination, and even simply focusing on the SWE-Bench leaderboard with a human within the loop.”

    For Konwinski, it’s not only a higher benchmark, however an open problem to the remainder of the trade. “When you hearken to the hype, it’s like we needs to be seeing AI docs and AI legal professionals and AI software program engineers, and that’s simply not true,” he says. “If we are able to’t even get greater than 10% on a contamination free SWE-Bench, that’s the fact test for me.”

    arent challenge coding Pretty published results
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSony FX2 Camera and BLAZAR Mantis 1.33x 35mm Anamorphic Lens Goes Daruma Dolls
    Next Article Google Posts $96B in Q2 Revenue, Spurred by Big AI Investments
    onlyplanz_80y6mt
    • Website

    Related Posts

    Monetization

    NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers

    September 26, 2025
    Monetization

    Google is still 210x bigger than ChatGPT in search

    September 26, 2025
    Monetization

    The Trump admin is going after semiconductor imports

    September 26, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    5 Steps for Leading a Team You’ve Inherited

    June 18, 20255 Views

    A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’

    July 1, 20253 Views

    Meera Sodha’s vegan recipe for Thai-style tossed walnut and tempeh noodles | Noodles

    June 28, 20253 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Marketing

    Pinterest Announces Ad Updates at ‘Pinterest Presents’ Event

    onlyplanz_80y6mtSeptember 26, 2025
    Monetization

    NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers

    onlyplanz_80y6mtSeptember 26, 2025
    Video Creation

    FEELWORLD P6XL On-Camera Field Monitor Released – Built-In Battery, Compact Size

    onlyplanz_80y6mtSeptember 26, 2025

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    SLR reform is happening. Does it matter?

    June 18, 20250 Views

    Panthers in awe of Brad Marchand’s ‘will to win’ in Cup run

    June 18, 20250 Views

    DOJ Offers Divestiture Remedy in Lawsuit Opposing Merger of Defense Companies

    June 18, 20250 Views
    Our Picks

    Pinterest Announces Ad Updates at ‘Pinterest Presents’ Event

    September 26, 2025

    NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers

    September 26, 2025

    FEELWORLD P6XL On-Camera Field Monitor Released – Built-In Battery, Compact Size

    September 26, 2025
    Recent Posts
    • Pinterest Announces Ad Updates at ‘Pinterest Presents’ Event
    • NYT ‘Connections’ Hints For Saturday, September 27: Today’s Clues And Answers
    • FEELWORLD P6XL On-Camera Field Monitor Released – Built-In Battery, Compact Size
    • Google is still 210x bigger than ChatGPT in search
    • Navratri 2025: 5 sweat-proof, long-lasting makeup tips for garba nights and pandal hopping
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Disclaimer
    • Get In Touch
    • Privacy Policy
    • Terms and Conditions
    © 2025 ThemeSphere. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.