matsearch provides an API for searching materials based on their composition using deep learning and material science techniques. It leverages a FAISS (Facebook AI Similarity Search) index for efficient similarity searching and DeepChem for feature extraction of material compositions. This system is designed to aid in the discovery and analysis of new material compositions, drawing inspiration from recent advances in AI-driven material science research.
The faiss.index and feature_vectors.npy files were pre-generated from a dataset of 380,000 materials by DeepMind (GNoME Project), enabling direct api use without needing to run vectorize and create_index.
The project consists of several key services: api, vectorize and create_index.
To run the api, execute the following command:
./start.sh apiThis will build a Docker container and start the API service, accessible on port 8080.
To search for materials similar to a given composition, send a POST request to the /search endpoint with the composition data:
curl -X POST http://localhost:8080/search -H "Content-Type: application/json" -d '{"composition": "KCl"}'The response includes two key pieces of information:
distances: A list of distances from the query composition to the similar materials found. Lower values indicate closer similarity to the queried composition.similar: A list of similar material compositions.
{
"distances": [
0.0023
],
"similar": [
"NaCl"
]
}The vectorize service is responsible for processing the material compositions and converting them into feature vectors. This is done using the ElementPropertyFingerprint from DeepChem, which creates a fingerprint based on elemental stoichiometry.
Execute:
./start.sh vectorizeThis will read material compositions from a CSV file, featurize each composition, and save the resulting feature vectors as a NumPy array.
The create_index service creates a FAISS index from the feature vectors generated by vectorize. This index is used for efficient similarity searches in the api.
Execute:
./start.sh create_indexThis will load the feature vectors, create a FAISS index, and save it for use by the api.
- DeepChem: Used for featurizing material compositions.
- FAISS: Provides efficient similarity search for high dimensional vectors.
- Flask: Serves the API for searching material compositions.
- Pandas & NumPy: For data manipulation and array operations.
- Docker: For containerizing and orchestrating the services.
Contact us for clarifications or contributions.