Physics-Informed AI for Stellar Parameter Prediction

A Research Knowledge Center for Spectroscopic Machine Learning
From 430+ experiments to reproducible papers on stellar surface gravity estimation

Figure: ViT achieves R²=0.70 on 1M spectra at SNR=5, approaching the Fisher/CRLB theoretical optimum (R²=0.87) and significantly outperforming LightGBM (R²=0.61) and template fitting (R²=0.40).

Figure: SpecViT architecture for 1D stellar spectra — (1) Input 4096-dim spectrum → (2) Noise injection → (3) Tokenization/Patch embedding → (4) Positional encoding + [CLS] → (5) Transformer Encoder ×6 → (6) Regression head → log(g)

Abstract

This repository is an experiment-driven knowledge center for physics-informed machine learning applied to stellar spectroscopy. We systematically predict stellar surface gravity ($\log g$) from 4096-dimensional synthetic BOSZ spectra, spanning linear baselines (Ridge, PCA) through gradient boosting (LightGBM) to deep architectures (CNN, Vision Transformer, Mixture of Experts).

Our core contributions:

Fisher/CRLB ceiling analysis establishing theoretical upper bounds ($R^2_{\max}=0.87$ at SNR=7)
ViT scaling law demonstrating $R^2=0.71$ on 1M spectra at noise_level=1.0 — the current state-of-the-art
Physics-gated MoE achieving $\rho=1.00$ oracle gain retention with soft routing
Design principles distilled from 430+ experiments guiding architecture choices

What This Repository Is

This is a paper factory — an experiment log repository organized to produce publishable research. Each paper track has:

A clear thesis and contribution claims
Evidence trails with reproducible experiments
Figure plans with asset paths
Maturity status and open questions

Not a codebase: Training code lives in separate repos (~/VIT, ~/BlindSpotDenoiser, ~/SpecDiffusion). This repo consolidates findings, figures, and design principles.

Start Here

Entry Point	Purpose	Time
Core Figure	ViT vs Fisher ceiling	1 min
EN/logg_main_20251130_en.md	Master experiment log & NN guidelines	15 min
EN/moe_hub_20251203_en.md	MoE research deep-dive	10 min
logg/scaling/fisher_hub_20251225.md	Theoretical upper bound analysis	10 min
design/principles.md	Consolidated design principles	5 min

Problem Setup

Task

Predict stellar surface gravity $\log g \in [1.0, 5.0]$ from 4096-dimensional spectral flux.

Data

Parameter	Value
Source	BOSZ synthetic spectral library (Bohlin+ 2017)
Wavelength	7100–8850 Å (PFS red arm)
Dimensions	4096 flux values
Parameter space	$T_{\text{eff}} \in [3750, 6000]$ K, $[\text{M/H}] \in [-0.25, 0.75]$, $\log g \in [1, 5]$
Train/Val/Test	1M / 1k / 10k (primary); 32k–100k for ablations

Noise Protocol

Level	Noise Model	Approx. Magnitude
noise=0.0	Clean	-
noise=0.1	$y = x + 0.1 \cdot \sigma \odot \epsilon$	mag ≈ 20
noise=1.0	$y = x + 1.0 \cdot \sigma \odot \epsilon$	mag ≈ 21.5
noise=2.0	High noise	mag ≈ 22+

All noise is heteroscedastic Gaussian with per-pixel $\sigma$ from PFS exposure simulator.

Metrics

Primary: $R^2$ (coefficient of determination)
Secondary: MAE (dex), RMSE (dex)
Efficiency: $\eta = R^2_{\text{model}} / R^2_{\max}$

Paper Tracks (Paper Factory)

Overview

Track	Core Idea	Key Assets	Status
A: ViT/Swin-1D	Transformer inductive bias for spectra	ViT $R^2=0.71$ @ 1M	🟢 Draftable
B: Physics-Gated MoE	Soft routing with metallicity-based experts	$\rho=1.00$, 9-expert	🟢 Draftable
C: Fisher Ceiling	CRLB upper bound + efficiency analysis	$R^2_{\max}=0.87$, SNR curves	🟢 Draftable
D: Diffusion Denoising	Representation learning for regression	46% wMAE improvement	🟡 Needs work
E: Distillation/Probing	Linear shortcut + latent extraction	seg_mean +150%	🟡 Needs work
F: Design Principles	Systematic benchmark paper	430+ experiments	🟢 Draftable

Track A: Vision Transformer for Stellar Spectra

Working Titles:

"Vision Transformer Meets Stellar Spectroscopy: Scaling Laws for Surface Gravity Estimation"
"ViT-Spectra: Breaking the LightGBM Ceiling with 1M Training Samples"

One-line thesis: Vision Transformer with 1D patch embedding achieves $R^2=0.71$ on 1M spectra at SNR=5, establishing a new state-of-the-art that significantly outperforms gradient boosting baselines.

Contributions:

First systematic study of ViT architecture for 1D spectral regression
Scaling law: $R^2$ continues improving from 100k to 1M samples
Patch embedding comparison: 1D-CNN vs sliding window
Approaching Fisher/CRLB theoretical limit (81% efficiency at SNR=5)
Loss function ablation: MSE vs L1 for label prediction

Evidence Map:

Asset	Path	Description
Core figure	`assets/r2_vs_snr_ceiling_test_10k_unified_snr.png`	ViT vs ceiling
ViT experiment	`logg/vit/exp_vit_1m_scaling_20251226.md`	1M training details
R² alignment	`logg/vit/note_r2_alignment_20251226.md`	Metric verification
CNN baseline	`logg/cnn/exp_cnn_dilated_kernel_sweep_20251201.md`	k=9 optimal
Scaling hub	`logg/scaling/scaling_hub_20251222.md`	Strategy overview

Figure Plan:

#	Figure	Source	Status
1	R² vs SNR with Fisher ceiling	`assets/r2_vs_snr_ceiling_test_10k_unified_snr.png`	✅ Ready
2	Training curves (loss, R² vs epoch)	WandB `khgqjngm`	✅ Ready
3	Attention maps on spectral lines	[TBD]	⏳ To generate
4	Scaling law: R² vs train size	[TBD]	⏳ To generate
5	Patch embedding ablation	`logg/vit/`	✅ Ready

Minimal Repro:

Data: /datascope/subaru/user/swei20/data/bosz50000/z0/mag205_225_lowT_1M/
Config: ~/VIT/configs/exp/vit_1m_large.yaml
Script: python ~/VIT/scripts/train_vit_1m.py --gpu 0
Checkpoint: ~/VIT/checkpoints/vit_1m/

Risks & Open Questions:

Attention visualization for interpretability
Comparison with CNN at matched parameter count
Generalization to real LAMOST/APOGEE spectra

Track B: Physics-Gated Mixture of Experts

Working Titles:

"Physics-Gated MoE for Stellar Parameter Estimation: When Soft Routing Beats Hard Decisions"
"Metallicity-Conditioned Experts: Exploiting Piecewise Structure in Spectral Regression"

One-line thesis: Physics-based expert partitioning by metallicity [M/H] with soft routing achieves 100% oracle gain retention ($\rho=1.00$), demonstrating that the $\log g$–spectrum mapping is piecewise simple.

Contributions:

[M/H] contributes 68.7% of MoE gains (vs 42.9% for $T_{\text{eff}}$)
Soft routing is critical: $\rho_{\text{soft}}=1.00$ vs $\rho_{\text{hard}}=0.72$
Physics-window gate features (Ca II triplet) enable deployable gating
9-expert MoE achieves $R^2=0.93$ vs global Ridge $R^2=0.86$
Continuous conditioning matches 80% of discrete MoE gains

Evidence Map:

Asset	Path	Description
MoE Hub	`EN/moe_hub_20251203_en.md`	Full analysis
Roadmap	`logg/moe/moe_roadmap_20251203.md`	Experiment tracking
Physics gate	`logg/moe/exp/exp_moe_phys_gate_baseline_20251204.md`	Gate design
9-expert	`logg/moe/exp/exp_moe_9expert_phys_gate_20251204.md`	Scaling to 9
Ablations	`logg/moe/exp/exp_moe_rigorous_validation_20251203.md`	[M/H] vs Teff
Gate images	`logg/moe/img/`	75 figures

Figure Plan:

#	Figure	Source	Status
1	Oracle vs Pseudo vs Soft routing	`logg/moe/img/`	✅ Ready
2	[M/H] contribution ablation	MVP-1.1	✅ Ready
3	Ca II triplet feature importance	MVP-5.0	✅ Ready
4	9-bin coverage heatmap	[TBD]	⏳ To generate
5	R² by expert bin	`logg/moe/img/`	✅ Ready

Minimal Repro:

Data: BOSZ 50k, noise=0.2, mask-aligned subsets
Script: ~/VIT/scripts/moe_phys_gate.py
Configs: 3-expert [M/H] binning

Risks & Open Questions:

Coverage for out-of-range [M/H] samples (184/1000 uncovered)
NN experts underperform Ridge — architecture issue?
Regression-optimal gate weights vs classification-optimal

Track C: Fisher/CRLB Theoretical Ceiling

Working Titles:

"How Well Can We Measure $\log g$? Fisher Information Bounds for Spectroscopic Estimation"
"The Ceiling, the Gap, and the Structure: Information-Theoretic Limits on Stellar Parameter Inference"

One-line thesis: Fisher/CRLB analysis reveals $R^2_{\max}=0.87$ at SNR=7, establishing a 32% headroom gap above current ML methods and identifying SNR≈4 as a critical information cliff.

Contributions:

Marginal CRLB via Schur complement for $\log g$ given nuisance parameters
$R^2_{\max}$ vs SNR curve with 10-90% confidence bands
Information cliff at SNR < 4 (median $R^2_{\max}=0$ at mag=23)
Schur decay ≈ 0.69 constant across SNR (degeneracy is physical)
5D ceiling (with chemical abundance nuisance) only 2% lower than 3D

Evidence Map:

Asset	Path	Description
Fisher Hub	`logg/scaling/fisher_hub_20251225.md`	Strategy
V2 experiment	`logg/scaling/exp/exp_scaling_fisher_ceiling_v2_20251224.md`	Grid-based
Multi-mag	`logg/scaling/exp/exp_scaling_fisher_multi_mag_20251224.md`	SNR sweep
5D ceiling	`logg/scaling/exp/exp_scaling_fisher_5d_multi_mag_20251226.md`	Robustness
Upper bound curves	`logg/scaling/exp/exp_scaling_fisher_upperbound_curves_20251225.md`	Main figures
Images	`logg/scaling/img/`	22 figures

Figure Plan:

#	Figure	Source	Status
1	$R^2_{\max}$ vs SNR	`logg/scaling/img/fisher_upperbound_r2max_vs_snr.png`	✅ Ready
2	$\sigma_{\min}$ vs SNR	`logg/scaling/img/fisher_upperbound_sigma_vs_snr.png`	✅ Ready
3	Model efficiency heatmap	[TBD]	⏳ To generate
4	Schur decay constancy	`logg/scaling/img/fisher_multi_mag_schur.png`	✅ Ready
5	CRLB distribution (V2 validation)	`logg/scaling/img/fisher_ceiling_v2_crlb_dist.png`	✅ Ready

Minimal Repro:

Data: Grid datasets data/bosz50k/z0/grid_mag{18,20,215,22,225,23}_lowT/
Script: ~/VIT/scripts/scaling_fisher_ceiling_v2.py
Output: CRLB per sample → $R^2_{\max}$ distribution

Risks & Open Questions:

Step-size sensitivity in finite differencing
Extension to real noise (non-Gaussian tails)
Weighted loss experiments to approach ceiling

Track D: Diffusion-Based Spectral Denoising

Working Titles:

"Diffusion Denoising for Stellar Spectra: Trading Hallucinations for Robustness"
"Residual Diffusion: Controlled Spectral Denoising with wMAE Loss"

One-line thesis: Residual diffusion with intensity-controlled output ($y + s \cdot g_\theta$) and weighted MAE loss achieves 46% denoising improvement while maintaining strict identity at $s=0$.

Contributions:

Standard DDPM fails on 1D spectra (loss convergence ≠ sampling success)
Bounded noise denoiser with $\lambda=0.5$ achieves 59.5% MSE reduction
Residual formula with $s$-prefactor ensures controllable output
wMAE loss (1/σ weighting) protects high-SNR regions
Identity guarantee: $s=0 \Rightarrow \text{wMAE}=0.0000$

Evidence Map:

Asset	Path	Description
Diffusion Hub	`logg/diffusion/diffusion_hub_20251206.md`	Strategy
Roadmap	`logg/diffusion/diffusion_roadmap_20251206.md`	Experiment tracking
Baseline (failed)	`logg/diffusion/exp/exp_diffusion_baseline_20251203.md`	DDPM failure
Bounded noise	`logg/diffusion/exp/exp_diffusion_bounded_noise_denoiser_20251204.md`	Success
wMAE residual	`logg/diffusion/exp/exp_diffusion_wmae_residual_denoiser_20251204.md`	Best
Images	`logg/diffusion/img/`	30 figures

Figure Plan:

#	Figure	Source	Status
1	Denoising samples comparison	`logg/diffusion/img/diff_wmae_denoising_samples.png`	✅ Ready
2	wMAE vs s intensity	`logg/diffusion/img/diff_wmae_comparison.png`	✅ Ready
3	Residual distribution	`logg/diffusion/img/diff_wmae_residual_dist.png`	✅ Ready
4	Downstream $\log g$ bias	[TBD]	⏳ To generate

Minimal Repro:

Script: ~/SpecDiffusion/scripts/train_residual_denoiser.py
Config: s=0.2, loss=wMAE

Risks & Open Questions:

Downstream parameter bias from denoising
DPS posterior sampling implementation
Generalization to real observed spectra

Track E: Distillation & Latent Probing

Working Titles:

"Linear Shortcuts in Spectral Regression: Why Ridge Remains a Strong Baseline"
"Probing Denoiser Latents: Representation Learning for Stellar Parameters"

One-line thesis: The $\log g$–flux mapping is fundamentally linear ($R^2=0.999$ at noise=0); neural networks should learn residuals, and denoiser latents with segmented pooling yield +150% over global mean.

Contributions:

Ridge achieves $R^2=0.999$ at noise=0 — mapping is linear
Residual MLP ($\hat{y} = w^\top x + g_\theta(x)$) outperforms direct MLP
Latent extraction: seg_mean_K8 achieves $R^2=0.55$ vs 0.22 baseline
Wavelength locality preservation is critical (+77.6%)
Error channel alone achieves $R^2=0.91$ (information leakage)

Evidence Map:

Asset	Path	Description
Main log	`EN/logg_main_20251130_en.md`	Overview
Distill experiments	`logg/distill/exp/`	4 experiments
Latent extraction	`logg/distill/exp/exp_latent_extraction_logg_20251201.md`	seg_mean
Ridge baseline	`logg/ridge/ridge_hub_20251223.md`	Linear analysis
NN analysis	`logg/NN/exp/exp_nn_comprehensive_analysis_20251130.md`	MLP

Risks & Open Questions:

End-to-end fine-tuning vs frozen encoder
Transfer to other stellar parameters ($T_{\text{eff}}$, [M/H])

Track F: Systematic Design Principles (Benchmark Paper)

Working Titles:

"430 Experiments Later: Design Principles for Spectroscopic Machine Learning"
"From Ridge to Transformer: A Practitioner's Guide to Stellar Parameter Estimation"

One-line thesis: Systematic ablation across 430+ experiments yields actionable design principles: small-kernel CNN (k=9), linear shortcuts, Ridge α scaling with noise, and data volume as the primary lever.

Contributions:

Model leaderboard across noise levels and data scales
Hyperparameter sensitivity analysis (Ridge α, LightGBM lr, CNN kernel)
Common pitfalls: StandardScaler kills LightGBM, large kernels hurt CNN
Recommended configurations with expected $R^2$
Decision tree for architecture selection

Evidence Map:

Asset	Path	Description
Design principles	`design/principles.md`	Consolidated
Ridge hub	`logg/ridge/ridge_hub_20251223.md`	Linear baseline
CNN experiments	`logg/cnn/exp_cnn_dilated_kernel_sweep_20251201.md`	Kernel sweep
LightGBM	`logg/lightgbm/exp_lightgbm_hyperparam_sweep_20251129.md`	Tree baseline
Benchmark	`logg/benchmark/benchmark_hub_20251205.md`	Comparison

Key Numbers:

Model	32k R²	100k R²	1M R²	Best Config
Ridge	0.458	0.486	0.50	α=200/3e4/1e5
LightGBM	0.536	0.558	-	lr=0.05, n=1000/2500
MLP	0.498	0.551	-	Residual, [256,64]
CNN (k=9)	0.657	-	-	AdaptiveAvgPool
ViT	-	-	0.71	L6-H256, 1D patch
Oracle MoE	-	-	0.62	9-expert
Fisher ceiling	-	-	0.89	Theoretical

Reproducibility & Experiment Protocol

Data Preprocessing

Normalization: Median flux normalization per spectrum
Noise injection: $y = x + \text{noise_level} \cdot \sigma \odot \epsilon$ where $\sigma$ is per-pixel error from PFS simulator
No StandardScaler for LightGBM (degrades performance by 36%)

Training Protocol

Train/val split: 90/10 or specified in experiment
Early stopping: patience=20 on validation $R^2$
Seeds: Fixed (42) for reproducibility, multi-seed for confidence intervals

Evaluation

Primary metric: $R^2$ on held-out test set
Bootstrap CI: 1000 samples for confidence intervals
Mask-aligned comparison: MoE vs global evaluated on identical subsets

Cross-Repository Structure

Repository	Purpose	Relation
`~/VIT/`	ViT/CNN/MLP training	→ results here
`~/BlindSpotDenoiser/`	Denoiser pretraining	→ latents here
`~/SpecDiffusion/`	Diffusion experiments	→ results here
This repo	Knowledge center	← consolidates all

Repository Map

Physics_Informed_AI/
├── EN/                          # 📖 English reports (START HERE)
│   ├── logg_main_*_en.md        #    Master experiment log
│   └── moe_hub_*_en.md          #    MoE research hub
│
├── logg/                        # 📁 Experiment logs by topic
│   ├── scaling/                 #    Scaling laws + Fisher ceiling
│   │   ├── fisher_hub_*.md      #    Theoretical upper bound
│   │   ├── scaling_hub_*.md     #    Data scaling analysis
│   │   └── exp/                 #    Individual experiments
│   ├── vit/                     #    Vision Transformer experiments
│   ├── moe/                     #    Mixture of Experts
│   ├── diffusion/               #    Diffusion denoising
│   ├── cnn/                     #    CNN architecture
│   ├── ridge/                   #    Ridge regression baseline
│   ├── lightgbm/                #    Tree model baseline
│   ├── pca/                     #    Dimensionality reduction
│   ├── NN/                      #    MLP experiments
│   ├── distill/                 #    Representation learning
│   ├── gta/                     #    Global Tower Architecture
│   ├── noise/                   #    Feature selection
│   ├── train/                   #    Training strategies
│   └── benchmark/               #    Cross-model comparison
│
├── design/                      # 📐 Design principles
│   └── principles.md            #    Consolidated guidelines
│
├── data/                        # 📊 Data documentation
│   └── bosz50k/                 #    Grid definitions
│
├── status/                      # 📋 Project tracking
│   ├── kanban.md                #    Experiment kanban
│   └── next_steps.md            #    Prioritized tasks
│
└── _backend/                    # 🔧 Templates
    └── template/                #    Report templates

How to Cite

@misc{physics_informed_ai_2025,
  author = {Viska Wei},
  title = {Physics-Informed AI for Stellar Parameter Prediction: 
           A Systematic Study from Ridge Regression to Vision Transformers},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/ViskaWei/Physics_Informed_AI}
}

For specific tracks, cite the corresponding experiment reports.

Roadmap to Submission

Week 1-2 (Track A: ViT)

Complete 200-epoch training (Run 1 MSE, Run 2 L1)
Generate attention visualization
Scaling law figure (R² vs train size)
Draft introduction + methods

Week 3-4 (Track C: Fisher Ceiling)

Model efficiency by SNR bin figure
Weighted loss experiment (approach ceiling)
Draft theory section

Week 5-6 (Track B: MoE)

100k scale replication
Full coverage solution (10th OOR expert)
Draft MoE methodology

Ongoing

Generalization to real spectra (LAMOST/APOGEE)
Multi-parameter extension ($T_{\text{eff}}$, [M/H])

Missing Information Checklist

The following items are marked [TBD] in this README and require completion:

Item	Location	Required Action
ViT attention maps	Track A, Figure 3	Generate with trained model
Scaling law figure	Track A, Figure 4	Run 100k/500k/1M comparison
Model efficiency heatmap	Track C, Figure 3	Compute $R^2/R^2_{\max}$ by SNR bin
Downstream $\log g$ bias	Track D, Figure 4	Run parameter estimation on denoised spectra
9-bin coverage heatmap	Track B, Figure 4	Visualize expert assignments
Real spectra validation	All tracks	LAMOST/APOGEE transfer experiments

Author

Viska Wei
Johns Hopkins University

Last Updated: 2025-12-27
Total Experiments: 430+ configurations
Current SOTA: ViT $R^2=0.71$ @ 1M samples, noise_level=1.0

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.obsidian		.obsidian
EN		EN
ML		ML
_backend		_backend
assets		assets
data/bosz50k		data/bosz50k
design		design
docs		docs
logg		logg
paper/vit		paper/vit
reports		reports
scripts		scripts
status		status
tempfit		tempfit
tools		tools
.cursorrules		.cursorrules
.gitignore		.gitignore
COMMANDS.md		COMMANDS.md
README.md		README.md
test.md		test.md

ViskaWei/Physics_Informed_AI

Folders and files

Latest commit

History

Repository files navigation

Physics-Informed AI for Stellar Parameter Prediction

Abstract

What This Repository Is

Start Here

Problem Setup

Task

Data

Noise Protocol

Metrics

Paper Tracks (Paper Factory)

Overview

Track A: Vision Transformer for Stellar Spectra

Track B: Physics-Gated Mixture of Experts

Track C: Fisher/CRLB Theoretical Ceiling

Track D: Diffusion-Based Spectral Denoising

Track E: Distillation & Latent Probing

Track F: Systematic Design Principles (Benchmark Paper)

Reproducibility & Experiment Protocol

Data Preprocessing

Training Protocol

Evaluation

Cross-Repository Structure

Repository Map

How to Cite

Roadmap to Submission

Week 1-2 (Track A: ViT)

Week 3-4 (Track C: Fisher Ceiling)

Week 5-6 (Track B: MoE)

Ongoing

Missing Information Checklist

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages