You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li>Correct: The refactored code passes all original tests.</li>
144
144
<li>Simple: Elegant code is short and natural.</li>
145
145
</ol>
146
-
We measure correctness by ensuring refactored code passes at least as many tests as the original sources and simpleness via the <ahref="https://en.wikipedia.org/wiki/Minimum_description_length">mininum description length (MDL)</a>. MDL, essentially the total log probability of all code under a model, captures both shortness and naturalness. This avoids issues of code golf, where shortness is achieved via code obfuscation.
146
+
We measure correctness by ensuring refactored code passes at least as many tests as the original sources and simpleness via the <ahref="https://en.wikipedia.org/wiki/Minimum_description_length">mininum description length (MDL)</a>. MDL, essentially the total log probability of all code under a model, captures both shortness and naturalness. This avoids issues of <ahref="https://en.wikipedia.org/wiki/Code_golf">code golf</a>, where shortness is achieved via code obfuscation.
147
147
148
148
<p>Formally, given a set of original programs $\{\rho_n\}_{n=1}^N$, we want to find a new library $\mathcal{L}$ and refactored programs $\{\rho'_n\}_{n=1}^N$.
149
149
We define the pass rate $\tau(\rho_n)$ as the fraction of unit tests program $\rho_n$ passes.
150
150
In practice we are concerned both with the case where we are refactoring several sources ($N>1$) and also the case where there is only a single large source we are refactoring ($N=1$).</p>
151
151
152
-
<p>We optimize the following objective:</p>
152
+
<p>Refactorings are evaluated using the following objective:</p>
<h2id="librarian-method">Librarian: Refactoring Code to Create Libraries</h2>
169
+
<h2>The MiniCode Benchmark</h2>
170
170
<p>
171
-
Librarian is our method for refactoring existing code into a more organized and reusable library. By identifying common patterns and abstracting them into shared building blocks, Librarian compresses collections of programs while migrating them to use these new components—reducing overall code size and often improving functionality. The method operates on a simple sample-and-rerank framework, progressively building a library of useful functions to maximize our refactoring objective. <strong>Figure 1</strong> illustrates the overall process.
171
+
We instantiate our evaluation across three splits of varying difficulty: large repositories, small repositories, and competition coding. In each of these domains, agents must understand a collection of code sources, synthesize a set of shared abstractions into a library, then refactor the code sources using that library.
172
+
The refactored code and library are evaluated on correctness and simplicity.
172
173
</p>
173
-
<p>
174
-
Librarian operates on a simple sample-and-rerank framework to maximize our refactoring objective described above. It maintains and grows a library of useful functions as part of this objective.
<li><strong>Clustering:</strong> We group related input programs into "tuples" by having a language model summarize the code, then clustering these summaries. This focuses the language model's attention on relevant code chunks.</li>
186
-
<li><strong>Sampling Refactorings:</strong> For each tuple, Librarian retrieves relevant existing library functions. Then, using the original code and retrieved functions as context, a language model proposes K candidate refactorings.</li>
187
-
<li><strong>Ranking with Compression:</strong> All K candidates are evaluated. We select the refactoring that scores highest on quality and maintains (or improves) test accuracy compared to the original code. New, useful library functions from the chosen refactoring are then added to the Librarian library for future use.</li>
188
-
</ul>
189
-
</section>
190
174
191
-
<section>
192
-
<h2>The MINICODE Benchmark</h2>
193
-
<p>
194
-
MINICODE evaluates a <strongclass="highlight highlight-blue">code agent's</strong> capability to identify abstractions across multiple implementations and design reusable <strongclass="highlight highlight-orange">libraries</strong>. Agents are presented with a collection of code sources and are tasked with refactoring them into a unified library. Key desiderata for these collections are that they must be <strongclass="highlight highlight-blue">compressible</strong>, containing a latent shared abstraction, and <strongclass="highlight highlight-blue">verifiable</strong>, allowing functional correctness to be measured. Agents interact with the benchmark via the terminal, managing multi-package Python repositories.
195
-
</p>
196
-
197
-
<h3>CodeContests Domain</h3>
175
+
<h3>Repository Split</h3>
198
176
<p>
199
-
Sourced from the CodeContests dataset, this domain uses competitive programming problems which naturally contain shared concepts and test cases. Each collection provides multiple solutions, and the agent's task is to create a central<code>library.py</code>file that is imported by each refactored solution.
177
+
We synthesize both large-scale and small-scale Python repositories by prompting LMs. In order to obtain a collection of refactorable repositories, we prompt LMs to generate ideas then synthesize repositories by generating variations of those ideas via personas. Agents must create a unified<code>common</code>library package that gets imported into the original repository packages.
200
178
</p>
201
179
202
-
<h3>Repositories Domain</h3>
180
+
<h3>CodeContests Split</h3>
203
181
<p>
204
-
This domain features synthesized projects with controlled complexity and overlap. Using a generative process, we create collections of repositories tailored to specific use cases. Agents must extract reusable functions from across these repositories and rewrite the original source code to use a new, shared <code>common</code>subpackage.
182
+
Sourced from the CodeContests dataset, this domain uses competitive programming problems which naturally contain shared concepts and test cases. Each collection provides multiple solutions, and the agent's task is to create a central <code>library.py</code>file that is imported into each refactored solution.
Check out the full benchmark <ahref="https://github.com/code-refactor/minicode">here</a>.
230
+
</section>
231
+
232
+
<section>
233
+
<h2id="librarian-method">Librarian: Refactoring Code to Create Libraries</h2>
234
+
<p>
235
+
Librarian is our method for refactoring existing code into a more organized and reusable library. By identifying common patterns and abstracting them into shared building blocks, Librarian compresses collections of programs while migrating them to use these new components—reducing overall code size and often improving functionality. The method operates on a simple sample-and-rerank framework, progressively building a library of useful functions to maximize our refactoring objective. <strong>Figure 1</strong> illustrates the overall process.
236
+
</p>
237
+
<p>
238
+
Librarian operates on a simple sample-and-rerank framework to maximize our refactoring objective described above. It maintains and grows a library of useful functions as part of this objective.
<li><strong>Clustering:</strong> We group related input programs into "tuples" by having a language model summarize the code, then clustering these summaries. This focuses the language model's attention on relevant code chunks.</li>
250
+
<li><strong>Sampling Refactorings:</strong> For each tuple, Librarian retrieves relevant existing library functions. Then, using the original code and retrieved functions as context, a language model proposes K candidate refactorings.</li>
251
+
<li><strong>Ranking with Compression:</strong> All K candidates are evaluated. We select the refactoring that scores highest on quality and maintains (or improves) test accuracy compared to the original code. New, useful library functions from the chosen refactoring are then added to the Librarian library for future use.</li>
0 commit comments