Skip to content

Conversation

@largomst
Copy link

@largomst largomst commented Dec 3, 2024

Introduce @instructor-ai/instructor to add support for OpenAI-compatible APIs that support json_object.

@jehna
Copy link
Owner

jehna commented Dec 3, 2024

How does this exactly differ from using OpenAI's library? You can already change the base url

@jehna
Copy link
Owner

jehna commented Dec 3, 2024

Right, this would use json_object instead of json_schema. Afaik json_object results in much more mistakes in json response result, causing the need to use Zod to parse the result (while json_schema guarantees that the response matches schema).

Using this approach would need much better error handling than what's implemented now (not a bad thing per se) and result in retries, which would mean it costs more to run

@MohammedInTheCloud
Copy link

`
(base) PS D:\local\h2\humanify-main> pnpm start openai --model="deepseek-chat" --apiKey="sk-454****" --baseURL="https://api.deepseek.com/v1" "D:\\\****\VM7477refixed.js"

humanifyjs@2.2.2 start D:\local\h2\humanify-main
tsx src/index.ts "openai" "--model=deepseek-chat" "--apiKey=sk-********" "--baseURL=https://api.deepseek.com/v1" "D:\\\\\\\\VM7477refixed.js"

(node:21436) [DEP0040] DeprecationWarning: The punycode module is deprecated. Please use a userland alternative instead.
(Use node --trace-deprecation ... to show where the warning was created)
Processing file 1/1
Processing: 1%[Instructor:ERROR] 2025-01-21T00:17:43.451Z: response model: Result - Validation issues: Validation error: Required at "newName"
file:///D:/local/h2/humanify-main/node_modules/.pnpm/zod@3.24.1/node_modules/zod/lib/index.mjs:587
const error = new ZodError(ctx.common.issues);
^

ZodError: [
{
"code": "invalid_type",
"expected": "string",
"received": "undefined",
"path": [
"newName"
],
"message": "Required"
}
]
at get error (file:///D:/local/h2/humanify-main/node_modules/.pnpm/zod@3.24.1/node_modules/zod/lib/index.mjs:587:31)
at m (D:\local\h2\humanify-main\node_modules.pnpm@instructor-ai+instructor@1.6.0_openai@4.79.1_zod@3.24.1__zod@3.24.1\node_modules@instructor-ai\instructor\src\instructor.ts:227:56)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at (D:\local\h2\humanify-main\src\plugins\openai\openai-rename.ts:31:24)
at visitAllIdentifiers (D:\local\h2\humanify-main\src\plugins\local-llm-rename\visit-all-identifiers.ts:42:21)
at (D:\local\h2\humanify-main\src\plugins\openai\openai-rename.ts:25:12)
at unminify (D:\local\h2\humanify-main\src\unminify.ts:26:27)
at Command. (D:\local\h2\humanify-main\src\commands\openai.ts:40:5) {
issues: [
{
code: 'invalid_type',
expected: 'string',
received: 'undefined',
path: [ 'newName' ],
message: 'Required'
}
],
addIssue: [Function (anonymous)],
addIssues: [Function (anonymous)],
errors: [
{
code: 'invalid_type',
expected: 'string',
received: 'undefined',
path: [ 'newName' ],
message: 'Required'
}
]
}

Node.js v22.5.1
 ELIFECYCLE  Command failed with exit code 1.`

@jehna
Copy link
Owner

jehna commented Jan 21, 2025

@MohammedInTheCloud ?

@MohammedInTheCloud
Copy link

MohammedInTheCloud commented Jan 21, 2025

I encountered an error when trying to use @largomst's solution with the DeepSeek API. The error occurs because the API response doesn't include the required "newName" field, causing the Zod validation to fail.

Here's the error I got when running the command:

pnpm start openai --model="deepseek-chat" --apiKey="sk-454****" --baseURL="https://api.deepseek.com/v1" "D:\**\**\****\VM7477refixed.js"

The key error message is:

[Instructor:ERROR] Validation error: Required at "newName"
ZodError: [ { "code": "invalid_type", "expected": "string", "received": "undefined", "path": [ "newName" ], "message": "Required" } ]

claude implemented several fixes to handle this issue:

  1. Added try/catch blocks around the API call
  2. Added null/undefined checking for the result
  3. Implemented fallback to return the original name if renaming fails
  4. Enhanced the system prompt for clearer response format requirements
  5. Added temperature parameter (0.2) for more consistent naming
  6. Improved logging for better debugging
import OpenAI from "openai";
import { visitAllIdentifiers } from "../local-llm-rename/visit-all-identifiers.js";
import { showPercentage } from "../../progress.js";
import { verbose } from "../../verbose.js";
import Instructor from "@instructor-ai/instructor";
import { z } from 'zod';

export function openaiRename({
  apiKey,
  baseURL,
  model,
  contextWindowSize
}: {
  apiKey: string;
  baseURL: string;
  model: string;
  contextWindowSize: number;
}) {
  const oai = new OpenAI({ apiKey, baseURL });
  const instructor = Instructor({
    client: oai,
    mode: "JSON",
  });

  return async (code: string): Promise<string> => {
    return await visitAllIdentifiers(
      code,
      async (name, surroundingCode) => {
        verbose.log(`Renaming ${name}`);
        verbose.log("Context: ", surroundingCode);

        try {
          const result = await instructor.chat.completions.create(
            toRenamePrompt(name, surroundingCode, model)
          );

          if (!result?.newName) {
            verbose.log(`Warning: Invalid response format for ${name}, keeping original name`);
            return name;
          }

          verbose.log(`Renamed to ${result.newName}`);
          return result.newName;
        } catch (error) {
          verbose.log(`Error renaming ${name}:`, error);
          return name; // Fallback to original name on error
        }
      },
      contextWindowSize,
      showPercentage
    );
  };
}

function toRenamePrompt(
  name: string,
  surroundingCode: string,
  model: string
) {
  const schema = z.object({
    newName: z.string({
      description: `The new name for the variable/function called \`${name}\``
    })
  });

  return {
    model,
    messages: [
      {
        role: "system",
        content: `You are a code reviewer tasked with improving code readability. Given a JavaScript/TypeScript identifier (variable or function name) and its surrounding code context, suggest a more descriptive name that reflects its purpose and usage. The current name is \`${name}\`. Respond with a JSON object containing the "newName" property.`
      },
      {
        role: "user",
        content: surroundingCode
      }
    ],
    response_model: {
      name: "Result",
      schema: schema
    },
    temperature: 0.2 // Lower temperature for more consistent naming
  };
}

@MohammedInTheCloud
Copy link

MohammedInTheCloud commented Jan 21, 2025

Successfully tested the tool on a large codebase (~30k LOC). It effectively deobfuscated the code using DeepSeek. Cost me about $0.60, and the result expanded to about 85k lines of code. Thanks @largomst @jehna

now i can rewrite this code, thanks to you @jehna !

@diolegend
Copy link

I get UnprocessableEntityError: 422 Failed to deserialize the JSON body into the target type: response_format: response_format.type json_schema is unavailable now at line 33 column 1

How to fix this problem? @largomst @jehna @MohammedInTheCloud @ocxo @0xdevalias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants