Statistical analysis with Meteorites data

May 20, 2020

*In this dataset we will analyse a dataset about meteorites in order to make predictions about a future strike on our planet. More specifically, this dataset has been provided by NASA and contains recorded meteorite impacts on Earth. After exploring this dataset, we will predict the chance that, within 1000 years, a high-impact meteorite will strike Earth. Meteorite of high-impact is considered that of diameter greater than 1km.

Dataset

The dataset of this project has been provided by NASA. A fraction of the imported dataset can be seen below:

We observe that the dataset contains 45715 measurements. We can get more detailed information about the values of this dataset:

Figure 2: Summary of the imported dataset.

The dataset contains 10 different variables related to a Meteorite impact.

Data visualisation

Let’s also explore the data visually by a scatter matrix.

Figure 3: Scatter matrix of imported dataset.

We can see in the scatter plot of lattitude and longitude the shape of the globe.

Since mass is a positive value that spans many orders of magnitude, so it’ll probably be easier to look at log mass instead of mass itself.

Figure 4: Histogram of Meteorites' masses.

Now, we can calculate and visualise the number of meteorites’ impacts per year from 1980 until 2020.

Figure 5: Number of meteorites' impacts per year.

Mass distribution

Now, it’s time to quantify the log-mass distribution. We can start and see if a normal works well.

Figure 6: Comparison of normal and observed distribution.

Since normal distribution is not the best, we can try both a skew-norm and a log-norm.

Figure 7: Comparison of skew-norm and log-norm distributions.

So either the log-norm or skew-norm looks like an adequate fit to the data.

Calculation of probability

We chose log-norm distribution in order to calculate the probability of a meteorite greater with a diameter greater than 1 km will impact Earth in the next 1000 years.

Figure 8: Log probability of asteroid being over given mass

So we have here the probability of an asteroid being above a certain mass when it hits Earth. But to answer the question “What is the probability that one or more asteroids of high mass strike Earth in 1000 years?” we need to factor in the actual time component. Assume that in the next 1000 years, we predict to have N impacts.

P(> 1 highmass) = 1 - P(0 highmass) = 1 - P(N not highmass) = 1 - P(not_highmass)^N

So to give a number, we need to calculate N from the yearly rate, number of years, and our detection efficiency and use that with the probability that any given impact is not high mass.

Figure 9: Probability that a >1km asteroid impacts with 1000 years

A more detailed description of this project’s implementation in Python can be seen in this github repository: Link to Github repository

Photo Credits

Statistical analysis with Meteorites data

Analysing meteorites data provided by NASA

Statistical analysis with Meteorites data

Dataset

Figure 1: Part of the imported dataset.

Figure 2: Summary of the imported dataset.

Data visualisation

Figure 3: Scatter matrix of imported dataset.

Figure 4: Histogram of Meteorites' masses.

Figure 5: Number of meteorites' impacts per year.

Mass distribution

Figure 6: Comparison of normal and observed distribution.

Figure 7: Comparison of skew-norm and log-norm distributions.

Calculation of probability

Figure 8: Log probability of asteroid being over given mass

Figure 9: Probability that a >1km asteroid impacts with 1000 years