Steichen2024

This is the public code and data repository for Steichen et al. 2024.

Data

There are both local and remote datasets in this repository.

Local datasets

These are datasets in this repository. They are stored in the data/ directory.

data/all_processed_combined.csv.gz is the processed 10X dataset with paired heavy and light chains as well as the single cell sort paired data from Sanger sequencing. In addition to having all AIRR compliant fields, this dataset contains the following:

Field	Description
sequence_id	A unique ID describing the sequence, takes the form: `<animal id>_<weeks post>_<tissue>_<sequencing method>_<index_number>_<probe>`
barcode	The barcode from the 10X GEM
NHP	The NHP animal ID
weeks_post	Weeks post vaccination. 3,4,7,10,13 or 710 (weeks 7 or 10)
tissue	GC or MBC
seq_method	Sequencing Method, 10X or Sanger
prime_boost	Sequence isolated after prime or boost or was a control (MD39)
probe	Probe population of this sequence [Env+, Env-, Env++KO-]
KO.	How many counts of the barcode labeled KO probe
well_id	Well ID for the single cell sort
IgG	The ELISA signal for IgG
N332-GT5 WT	The ELISA signal for GT5
N332-GT5 KO	The ELISA signal for the KO
B23	The boosting reagent ELISA
IgG.1	The boolean measure for if it is an IgG

data/all_processed_combined_personalized.csv.gz is the same as data/all_processed_combined.csv.gz but with the sequences personalized to the IGHD3-43 allele haplotypes found in data/genotypes.xlsx. The haplotypes are:

a. IGHD3-43*01/IGHD3-43*01

b. IGHD3-43*01/IGHD3-43*01_S8240

c. IGHD3-43*01_S8240/IGHD3-43*01_S8240

Remote datasets

These are datasets that are too large to store in this repository. They are stored in the an AWS S3 bucket.

74 Macaque naive BCR sequences found in this study

a. Annotated in feather format and tar zipped here. This data does not contain the animal IDs but contains the SRA number.

b. Annotated in parquet format and tar zipped here. This data has the animal IDs and after its uncompressed it can be used in AWS EMR.

Notebooks

We also are adding local notebooks and EMR notebooks that were used in this study.

Local Notebooks

Personalize Seqs.ipynb reads in all 10X data from data/all_processed_combined.feather and reannotates the sequences based on the IGHD3-43 allele haplotypes found in data/genotypes.xlsx

Process Seqs.ipynb reads in all 10X and single-cell sorting paired and personalized sequences from data/all_processed_combined_personalized.feather and adds the following fields:

Add closest human ortholog to the V and J genes
Add the HCDR3 and LCDR3 length
Annotate sequences with BG18 type I criteria.
Annotate sequences with BG18 type I criteria with alternative D3-41 reading frame.
Assign precursor definitions
Run mutational analysis
Run clustering on BG18 sequences using the criteria from the paper.

EMR Notebooks

BG18 human precursors searched for human BG18-like precursors and calculate their frequencies on NGS datasets of 1.1 billion human BCR heavy chain sequences from 14 human donors that were previously described (Briney et al., 2019; Steichen et al., 2019; Willis et al., 2023).

BG18 macaque precursors searched for rhesus macaque BG18-like precursors and calculate their frequencies on 154 datasets of 95.4 million macaque BCR sequences from 60 macaques.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
emr_notebooks		emr_notebooks
local_notebooks		local_notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Steichen2024

Data

Local datasets

Remote datasets

Notebooks

Local Notebooks

EMR Notebooks

About

Uh oh!

Releases 2

Packages

Contributors 3

Uh oh!

Languages

SchiefLab/Steichen2024

Folders and files

Latest commit

History

Repository files navigation

Steichen2024

Data

Local datasets

Remote datasets

Notebooks

Local Notebooks

EMR Notebooks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Uh oh!

Languages

Packages