Skip to content

asheshrambachan/LanguageModel_Labels

Repository files navigation

Large Language Models: An Applied Econometric Framework

This repository contains the code, data, and outputs associated with the paper "Large Language Models: An Applied Econometric Framework" by Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan.

Repository Structure

This repository is organized to facilitate the replication of results presented in the paper. The subdirectories are structured based on task type (prediction or estimation) and application (financial news headlines or Congressional legislation):

  • prediction_headlines/: Code and data for prediction tasks involving financial news headlines.
  • prediction_legislation/: Code and data for prediction tasks involving Congressional legislation.
  • estimation_headlines/: Code and data for estimation tasks involving financial news headlines.
  • estimation_legislation/: Code and data for estimation tasks involving Congressional legislation.
  • figures/: Code and outputs for generating figures presented in the paper.
  • tables/: Code and outputs for generating tables presented in the paper.

Getting Started

Step 1: Set Up Your OpenAI API Key

To query LLMs, you need to add your API_KEY to a .env file:

  1. Visit the OpenAI API Keys page linked here.
  2. Generate a new API key and copy it.
  3. Replace your key in the following command and run it to create a .env file:
    cd path/to/LanguageModel_Labels
    echo 'API_KEY="paste your key here"' > .env

Step 2: Set Up a Conda Environment

Create a Conda environment with the required dependencies for Python and R:

cd path/to/LanguageModel_Labels
conda update conda
conda config --set channel_priority strict
conda env create -f conda_llm_env.yaml
conda activate llm_env

Step 3: Download Restricted Data

The tasks based on financial news headlines includes restricted data provided by Wharton Research Data Services (WRDS). To access this data:

  1. Go to Beta Suite by WRDS and log in with your credentials.

  2. Configure the parameters as follows:

  3. Click the Submit Form button.

  4. Once the query status shows Success, click Download .csv Output.

  5. Rename the downloaded file as CAPM_returns.csv.

  6. Place it in the directory: ./estimation_headlines/data/raw

Replication Options

You can replicate the results presented in the paper using one of the following methods:

  1. Full Replication: Run the run_all.sh shell script to execute all steps in sequence:

    bash run_all.sh
    
  2. Partial Replication: Run individual scripts for specific steps. Each subdirectory contains a detailed README.md file with instructions for task-specific replication.

Citation

If you use this repository, please cite the paper:

@article{ludwig2024largelanguagemodelsapplied,
    title={Large Language Models: An Applied Econometric Framework}, 
    author={Jens Ludwig and Sendhil Mullainathan and Ashesh Rambachan},
    year={2024},
    journal={arXiv preprint arXiv:2412.07031},
    url={https://arxiv.org/abs/2412.07031}, 
}

Acknowledgement

Wharton Research Data Services (WRDS) was used in preparing this paper. This service and the data available thereon constitute valuable intellectual property and trade secrets of WRDS and/or its third-party suppliers.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •