diff --git a/Thoughts_on_Scaling_data-underground.md b/Thoughts_on_Scaling_data-underground.md
new file mode 100644
index 0000000..e1dd828
--- /dev/null
+++ b/Thoughts_on_Scaling_data-underground.md
@@ -0,0 +1,139 @@
+#### What is this file?
+This markdown file is something akin to a slack comment in purpose, but way to long for that format, so I'm putting it here. I apologize for the rambling.
+
+This document is not meant as a directive for data-undergound at all but more as a series of ideas from a single person to consider. It is, to some extent, a brain dump. Sorry for the rambling.
+
+- Justin Gosses
+
+#### Introduction / Why I'm writing this?
+After I reading the original document in this repo titled open-data-guidelines.md, it bugged me a little for reasons I had a hard time narrowing down. After giving it some thought, I realized I felt there were things left out due to the focus on the dataset.
+
+The focus on the dataset makes sense, of course, but I think you get to a better place but not just asking questions about what characteristics the dataset should have but also considering the site that hosts the dataset as well as the different types of members in the community around both the dataset and the site as a hole.
+
+This document is an attempt to suggest other perspectives to consider when creating an open-data site for datasets geared to geoscience + coding beyond what characteristics should the dataset have.
+
+#### Background That Informs My Requirements/Needs
+First, I co-lead the houston data visualization meetup, which means I spend some time searching for good datasets people would enjoy visualizing during our datajams over the course of about 4 hours. Second, I help maintain data.nasa.gov, which has approximately ~40,000 datasets and gets harvested into data.gov, which has almost a quarter of a million.
+
+#### Problems With Scale
+I typically find myself less bothered by the characteristics of individual datasets and more by my experiences, or other peoples' experiences, of trying to work with open-datasets in aggregate. Searching through them, evaluating them, organizing them, and aggregating them is often very difficult due to constraints built in place early. Sometimes constraints occur, because certain metadata wasn't encouraged. Other times sites lack certain filtering capability. Other times aspects of the datasets are not programmatically accessible. I less often find myself working with a specific dataset and say "oh it would be great it this particular dataset had blank".
+
+These concerns are less obvious with only 17 datasets on https://dataunderground.org/dataset as of today. These problems appear more as the number of individual datasets grows greater than users' willingess to read through all of them.
+
+Additionally, these types of issues increase as the percent of datasets that can be aggregated into larger datasets increases. If all the datasets are completely separate or different in domain and,or format, than these issues are less of a problem.
+
+#### Minimizing Time-To-Start is Maximizing Use Rate
+A hypothesis I have based on my own experiences and, to be honest, relatively little real data is that a lot of the most used open-data is just the easiest to use.
+
+This is what we see with data.nasa.gov. The most used datasets are typically small ones, only one file, in CSV format, that are harvested into sites with great user-interfaces, like kaggle and data.world, making the evaluation and time to start very minimize.
+
+A lot of datasets are hard to discover, and "discover" is often a more accurate word than "find" as a significant amount of use of open-data comes from people who didn't already know a dataset existed except through the open-data site or someone else who found it on the open-data site.
+
+To maximize the rate of discoverability, you need to make the amount of time to get there shorter. This requires the ability to sort and filter datasets in ways the correlate with user needs.
+
+[STRONGLY HELD OPIONION] The search functionality of some open-data sites is more geared to finding datasets than discovering them, which impacts the user experience.
+
+#### Discoverability Problems That Occur With More Datasets on a Site
+
+##### A. How do you find datasets based on task?
+Some users won't care about the dataset content so much as they know it has labels and can be modeled as a time series problem. How do they find all the datasets that meet that definition?
+
+##### B. How do you find datasets based on data format or data structure?
+Some people will want LAS 2.0 well logs. Others will absolutely need well with well paths included, which LAS file formats won't have.
+
+##### C. Can you find all the versions of a dataset?
+If a user stumbles upon a preprocessed dataset like this one will they also see that there is an original dataset here