A lightweight Convolutional Neural Network (CNN) that predicts photometric redshift (photo-z) from multi-band galaxy images, trained on data from the Sloan Digital Sky Survey (SDSS DR12). This approach offers a fast, scalable alternative to traditional spectroscopic redshift estimation.
Understanding galaxy redshift is crucial in cosmology — it tells us how far away and how old a galaxy is. But measuring redshift using spectroscopy requires expensive equipment and takes a lot of time.
This project presents a faster, image-based alternative using deep learning. We built a CNN that takes 32x32x5 galaxy image cubes — basically, small images of galaxies captured in 5 different color filters: u, g, r, i, z — and learns to predict how redshifted (i.e., distant) the galaxy is.
We designed a CNN (Convolutional Neural Network) that works like a smart pattern detector. It looks at the galaxy images and learns to estimate the redshift without needing any manual calculations or domain-specific formulas.
Here’s what happens step-by-step:
- Input Layer: A 32x32 image with 5 channels (one for each photometric band).
- Conv2D Layer 1: Learns 32 patterns using 5x5 filters, followed by ReLU activation.
- MaxPooling2D: Shrinks image size to retain only important info (2x2 pooling).
- Conv2D Layer 2: Learns 64 more complex patterns, again with 5x5 filters + ReLU.
- MaxPooling2D: Again reduces size to focus on most meaningful parts.
- Flatten: Converts the data into a 1D array for the next layers.
- Dense Layer 1: Fully connected layer with 220 neurons + ReLU.
- Dropout: Randomly drops 50% of the connections to prevent overfitting.
- Dense Layer 2: Another fully connected layer with 64 neurons + ReLU.
- Dropout: Another dropout layer to regularize.
- Output Layer: Single neuron with Sigmoid activation to predict a value between 0 and 1 (normalized redshift).
To predict how far away a galaxy is (its redshift), we first prepare the data so our neural network model can understand it.
- Each galaxy is converted into a small image of size
32x32 pixels. - Instead of a regular color image (Red, Green, Blue), we use five filters:
- u (ultraviolet)
- g (green)
- r (red)
- i (infrared)
- z (deep infrared)
- This gives us a 5-channel image or tensor:
32x32x5.
Below is an example of a galaxy shown in the five SDSS photometric bands:
- All image values are scaled (normalized) between 0 and 1.
- Redshift values are also normalized so the model can learn efficiently.
- It takes the 5-band galaxy image as input.
- Learns patterns related to shape, brightness, and band combinations.
- Outputs a redshift prediction (a number between 0 and 1).
- Later, this is scaled back to get the real redshift.
This whole process removes the need for hand-crafted features or traditional redshift fitting.
We tested our model on two versions of the dataset:
- Full Dataset: Includes all galaxies, even those with extreme redshift values.
- Clipped Dataset: Removes galaxies with very high redshift (e.g., z > 0.4) to reduce outliers.
| Metric | Full Dataset | Clipped Dataset |
|---|---|---|
| Mean Absolute Error (MAE) | 0.0304 | 0.0556 |
| Mean Squared Error (MSE) | 0.0017 | 0.0056 |
| R-squared Score | 0.9041 | 0.8826 |
| Precision | 0.0269 | 0.0427 |
| Catastrophic Outliers ( | Δz | >0.15) |
Conclusion: Including all redshift values — even the extreme ones — helped the model generalize better. It learned a wider range of galaxy types and redshifts.
- Sloan Digital Sky Survey (SDSS)
- Original research and code structure inspired by peer collaboration.


