I am going to use a figure from my own thesis, Figure 28, see the video below
I was looking at the properties of neighbor-galaxies around quasars. I calculated the separation in Megaparsecs (comoving) between the quasar and its neighbors. So, that’s the x-axis, the horizontal axis.
On the y-axis, I show you the fraction of the neighbors within that distance.
To produce this figure I had to use data from three sources.
- First source was my Stone et al. 2023 paper, with data from the GAMA archive.
- The second source was my earlier Stone et al. 2021 paper, with data from fresh observations we collected ourselves.
- The third source was data from a paper by someone else, Fogasy et al. 2017.
First, I had to understand the material – what objects are observed, what was measured, how, etc. I did that by reading the paper or by understanding the research I was doing.
Now, the juicy part, in order to combine the data tables from three different resources, I had to make sure that the data tables were of the same format, so that columns from different properties do not get mixed up with values from some other properties.
Then, I had to make sure that the units are the same, and that the conversions are done correctly. I used python tools (e.g., astropy) to do that, but also reading and re-reading to make sure I knew which units were exactly used, for example, for the distance measurements.
When combining tables, I had to make sure also that the cells with blank values were appropriately accounted for! Frequently I use checkpoints to test and validate that my table manipulations are correct and nothing is going astray.
While one might think that working with a data table is simple, things get complicated when you are working with tables from different sources (even from your own papers!). Plus, it can be easy to make a (huge!) mistake if you are dealing with BIG data. And from the technical perspective, you have to learn specialized tools, such as python coding, in order to streamline and make the table manipulation process faster.
We want to make sure we are comparing apples to apples! This is all to make sure that when you combine data from various resources you are comparing the correct data with correct units etc. etc.
When I put my data points from the Stone et al. 2023 paper with my recent work and previous work together, and then created a visual, I saw much more clearly how the 2023 publication focused on very close neighbors of quasars (up to 2 Mpc), while the previous study looked at neighbors at larger separations.
Bonus tip: If you are working with large data tables, it is often useful to test your code /procedure on a subset of data or even on a single data point, and then propagate the calculation/manipulation to the full table/array.
Another bonus tip: While you are working on the computer and coding, it is also sometimes useful to draw by hand a schematic of your procedure, so that you can see clearly what you are doing and are not lost in many lines of code. Needless to say, good documentation is a key in order to keep yourself organized and to share later on data with the public/fellow scientists.
– Dr. Maria B. Stone, aka Your astronomer from Helsinki
Recent Comments