Statistical analysis with Meteorites data
*In this dataset we will analyse a dataset about meteorites in order to make predictions about a future strike on our planet. More specifically, this dataset has been provided by NASA and contains recorded meteorite impacts on Earth. After exploring this dataset, we will predict the chance that, within 1000 years, a high-impact meteorite will strike Earth. Meteorite of high-impact is considered that of diameter greater than 1km.
Dataset
The dataset of this project has been provided by NASA. A fraction of the imported dataset can be seen below:
We observe that the dataset contains 45715 measurements. We can get more detailed information about the values of this dataset:
The dataset contains 10 different variables related to a Meteorite impact.
Data visualisation
Let’s also explore the data visually by a scatter matrix.
We can see in the scatter plot of lattitude and longitude the shape of the globe.
Since mass is a positive value that spans many orders of magnitude, so it’ll probably be easier to look at log mass instead of mass itself.
Now, we can calculate and visualise the number of meteorites’ impacts per year from 1980 until 2020.
Mass distribution
Now, it’s time to quantify the log-mass distribution. We can start and see if a normal works well.
Since normal distribution is not the best, we can try both a skew-norm and a log-norm.
So either the log-norm or skew-norm looks like an adequate fit to the data.
Calculation of probability
We chose log-norm distribution in order to calculate the probability of a meteorite greater with a diameter greater than 1 km will impact Earth in the next 1000 years.
So we have here the probability of an asteroid being above a certain mass when it hits Earth. But to answer the question “What is the probability that one or more asteroids of high mass strike Earth in 1000 years?” we need to factor in the actual time component. Assume that in the next 1000 years, we predict to have N impacts.
P(> 1 highmass) = 1 - P(0 highmass) = 1 - P(N not highmass) = 1 - P(not_highmass)^N
So to give a number, we need to calculate N from the yearly rate, number of years, and our detection efficiency and use that with the probability that any given impact is not high mass.
A more detailed description of this project’s implementation in Python can be seen in this github repository: Link to Github repository