You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>We're tackling the challenge of refactoring code to make it more organized and efficient. Imagine you have several pieces of code that do similar things. Our goal is to create a unified library that captures the common patterns, significantly reducing the total code size while still supporting all original functionalities. This not only makes the code base smaller but also helps uncover new ways to use the code.</p>
109
+
<p>We study the problem of refactoring code for better organization and efficiency. Given multiple codebases with similar functionalities, our goal is to create a unified library that captures common patterns.
110
+
This process should significantly reduce the total amount of code while ensuring all original functionality remains intact.
111
+
112
+
</p>
113
+
114
+
<p>We evaluate refactorings based on two key principles:
110
115
111
-
<h3>How We Do It</h3>
112
-
<p>Our approach focuses on finding refactorings that are both correct and simple.</p>
113
116
<ul>
114
-
<li><strong>Correctness</strong> is straightforward: Does the refactored code pass all the original tests?</li>
115
-
<li><strong>Simplicity</strong> is more nuanced. We don't just count characters; we define simplicity using <strong>Minimum Description Length (MDL)</strong>. This means we're looking for code that is not only short but also natural, elegant, and extensible—like finding the most concise yet understandable way to express an idea, rather than just the shortest, potentially unreadable, version (think "Perl Golf" where the shortest code is often incomprehensible!).</li>
117
+
<li><strong><spanclass="highlight highlight-green">Correctness</span> is straightforward</strong>: Does the refactored code pass all the original tests?</li>
118
+
<li><strong><spanclass="highlight highlight-blue">Simplicity</span> is more nuanced</strong>: We don't just count characters; we define simplicity using <strongclass="highlight highlight-orange">Minimum Description Length (MDL)</strong>. This means we're looking for code that is not only short but also natural, elegant, and extensible—like finding the most concise yet understandable way to express an idea, rather than just the shortest, potentially unreadable, version (think "Perl Golf" where the shortest code is often incomprehensible!).</li>
116
119
</ul>
117
120
118
121
@@ -157,6 +160,66 @@ <h3>How It Works:</h3>
157
160
</ul>
158
161
</section>
159
162
163
+
<section>
164
+
<h2>The MINICODE Benchmark</h2>
165
+
<p>
166
+
MINICODE evaluates a <strongclass="highlight highlight-blue">code agent's</strong> capability to identify abstractions across multiple implementations and design reusable <strongclass="highlight highlight-orange">libraries</strong>. Agents are presented with a collection of code sources and are tasked with refactoring them into a unified library. Key desiderata for these collections are that they must be <strongclass="highlight highlight-blue">compressible</strong>, containing a latent shared abstraction, and <strongclass="highlight highlight-blue">verifiable</strong>, allowing functional correctness to be measured. Agents interact with the benchmark via the terminal, managing multi-package Python repositories.
167
+
</p>
168
+
169
+
<h3>CodeContests Domain</h3>
170
+
<p>
171
+
Sourced from the CodeContests dataset, this domain uses competitive programming problems which naturally contain shared concepts and test cases. Each collection provides multiple solutions, and the agent's task is to create a central <code>library.py</code> file that is imported by each refactored solution.
172
+
</p>
173
+
174
+
<h3>Repositories Domain</h3>
175
+
<p>
176
+
This domain features synthesized projects with controlled complexity and overlap. Using a generative process, we create collections of repositories tailored to specific use cases. Agents must extract reusable functions from across these repositories and rewrite the original source code to use a new, shared <code>common</code> subpackage.
0 commit comments