Correlation between the diameter and the number of layers

This analysis will investigate the presence of a possible correlation between the crater diameter and the number of azimuthal layers.

From the graph, there seems to be a slight correlation between the number of layers and the crater dimension. The correlation is increasing, because at a higher number of layers corresponds to a greater diameter of the crater.

The calculation of the Pearson correlation coefficient (0.13527, RSquared = 0.018298) seems to confirm this trend.

Correlation between the diameter and the latitude

This analysis will investigate the presence of a possible correlation between the crater diameter and the latitude.

From the graph, there seems to be no correlation between the latitude and the crater dimension.

However, when calculating the Pearson correlation coefficient (-0.057940), it highlights a slight but significant negative correlation, so craters towards the north pole tend to be slightly smaller.

The RSquared of 0.00335708532421 indicates that only the 0.335708532421% of the variance of the response variable is explained by the explanatory variable, thus denoting a low reliability of the correlation results.

Correlation between the number of azimuthal layers and the latitude

This analysis will investigate the presence of a possible correlation between the crater diameter and the latitude.

From the graph, there seems to a slight increasing correlation between the latitude and the number of azimuthal layers, so that craters closer to the north pole have a higher number of layers.

However, when calculating the Pearson correlation coefficient (0.062959), it highlights a slight but significant positive correlation, so craters towards the north pole tend to have higher number of layers.

The RSquared of 0.0039638 indicates that only the 0.4% of the variance of the response variable is explained by the explanatory variable, thus again denoting a low reliability of the correlation results.

So, at the end, this correlation analysis did not give the expected results, in finding some very useful information to better understand Mars’ craters history and evolution.

PYTHON CODE

# -*- coding: utf-8 -*-
import pandas
import numpy
import scipy
import statsmodels.formula.api as smf
import seaborn
import matplotlib.pyplot as plt
import statsmodels.stats.multicomp as multi
data = pandas.read_csv('marscrater_pds.csv', low_memory=False)
print (len(data))
print (len(data.columns))

### ANOVA
print ("This analysis will investigate the presence of a relationship between the latitude of the craters on Mars’ surface and the number of azimuthal layers and the dimension of the craters themselves, in order to formulate possible hypothesis of the evolution of Mars craters.")

# Divide the latitude in 4 categories, explore the variance in number of layers and dimensions
print ("The latitude coordinate, which is a quantitative variable, is divided into 4 categories, corresponding to half an hemisphere each. The difference in the number of layers and in the dimension of the crater is explored.")

# Define the function for latitude categorisation

def latitude_categorisation_function (data): 
    if -100 <= data['LATITUDE_CIRCLE_IMAGE'] < -50:
        return "south pole"
    elif -50 <= data['LATITUDE_CIRCLE_IMAGE'] < -0:
        return "south equator"    
    elif 0 <= data['LATITUDE_CIRCLE_IMAGE'] < 50:
        return "north equator"
    elif 50 <= data['LATITUDE_CIRCLE_IMAGE'] <= 100:
        return "north pole"
    
# Define the function for latitude categorisation
def longitude_categorisation_function (data):    
    if -200 <= data['LONGITUDE_CIRCLE_IMAGE'] < -100:
        return "1"    
    elif -100 <= data['LONGITUDE_CIRCLE_IMAGE'] < 0:
        return "2"    
    elif 0 <= data['LONGITUDE_CIRCLE_IMAGE'] < 100:
        return "3"    
    elif 100 <= data['LONGITUDE_CIRCLE_IMAGE'] <= 200:
        return "4"
# Categorise the latitude

data['Latitude_areas'] = data.apply(lambda data: latitude_categorisation_function (data), axis=1)
data['Latitude_areas'] = data['Latitude_areas'].astype('category')
data['Longitude_areas'] = data.apply(lambda data: longitude_categorisation_function (data), axis=1)
data['Longitude_areas'] = data['Longitude_areas'].astype('category')
        
# ANOVA between latitude and number of layersprint (“ANOVA: latitude and number of layers.”) 
anova_model_latitude_layers = smf.ols (formula = 'NUMBER_LAYERS ~ C(Latitude_areas)', data=data)
print(anova_model_latitude_layers.fit().summary())

seaborn.factorplot(x="Latitude_areas", y="NUMBER_LAYERS", data=data)
plt.xlabel("Latitude")
plt.ylabel("Number of layers")

# Comparison of means and standard deviations
mean_latitude_layers = data.groupby("Latitude_areas")['NUMBER_LAYERS'].mean()
print(mean_latitude_layers)

std_latitude_layers = data.groupby("Latitude_areas")['NUMBER_LAYERS'].std()
print(std_latitude_layers)

# Post hoc
post_hoc_diameter_latitude_layers = multi.MultiComparison (data['NUMBER_LAYERS'], data['Latitude_areas'])
print(post_hoc_diameter_latitude_layers.tukeyhsd().summary())

# ANOVA between latitude and diameter
print ("ANOVA: latitude and diameter.")
anova_model_latitude_diameter = smf.ols (formula = 'DIAM_CIRCLE_IMAGE ~ C(Latitude_areas)', data=data)
print(anova_model_latitude_diameter.fit().summary())
seaborn.factorplot(x="Latitude_areas", y="DIAM_CIRCLE_IMAGE", data=data)
plt.xlabel("Latitude")
plt.ylabel("Crater diameter")

# Comparison of means and standard deviations
mean_latitude_diameter = data.groupby("Latitude_areas")['DIAM_CIRCLE_IMAGE'].mean()
print(mean_latitude_diameter)
std_latitude_diameter = data.groupby("Latitude_areas")['DIAM_CIRCLE_IMAGE'].std()
print(std_latitude_diameter)

# Post hoc
post_hoc_diameter_latitude_diameter = multi.MultiComparison (data['DIAM_CIRCLE_IMAGE'], data['Latitude_areas'])
print(post_hoc_diameter_latitude_diameter.tukeyhsd().summary())

### PEARSON CORRELATION
print ("This analysis will investigate the presence of a relationship between the latitude of the craters on Mars’ surface and the number of azimuthal layers and the diameter, in order to see if it is possible to find a correlation between the crater’s position and features.")

# Correlation between the diameter and the number of layers
print ("This analysis will investigate the presence of a possible correlation between the crater diameter and the number of azimuthal layers.")

seaborn.regplot (x='NUMBER_LAYERS', y='DIAM_CIRCLE_IMAGE', data=data)
plt.ylabel('Crater diameter')
plt.xlabel('Number of layers')
plt.title('Correlation between the crater diameter and the number of azimuthal layers')

print ("From the graph, there seems to be a slight correlation between the number of layers and the crater dimension. The correlation is increasing, because at a higher number of layers corresponds to a greater diameter of the crater.")

correlation_layers_diameter = scipy.stats.pearsonr (data['NUMBER_LAYERS'], data['DIAM_CIRCLE_IMAGE'])
print(correlation_layers_diameter)
print(correlation_layers_diameter[0]**2)

print ("The calculation of the correlation coefficient seems to confirm this trend.")

# Correlation between the diameter and the latitude
print ("This analysis will investigate the presence of a possible correlation between the crater diameter and the latitude.")

seaborn.regplot (x='LATITUDE_CIRCLE_IMAGE', y='DIAM_CIRCLE_IMAGE', data=data)
plt.ylabel('Crater diameter')
plt.xlabel('Latitude')
plt.title('Correlation between the crater diameter and the latitude')

print ("From the graph, there seems to be no correlation between the latitude and the crater dimension.")

correlation_latitude_diameter = scipy.stats.pearsonr (data['LATITUDE_CIRCLE_IMAGE'], data['DIAM_CIRCLE_IMAGE'])
print(correlation_latitude_diameter)
print ("However, when calculating the Pearson correlation coefficient, it highlights a slight but significant negative correlation, so craters towards the north pole tend to be slightly smaller.")
print(correlation_latitude_diameter[0]**2)
print ('The RSquared of %s indicates that only the %s percent of the variance of the response variable is explained by the explanatory variable, thus denoting a low reliability of the correlation results.' %(correlation_latitude_diameter[0]**2,correlation_latitude_diameter[0]**2*100))

# Correlation between the number of layers and the latitude
print ("This analysis will investigate the presence of a possible correlation between the crater diameter and the latitude.")

seaborn.regplot (y='LATITUDE_CIRCLE_IMAGE', x='NUMBER_LAYERS', data=data)
plt.xlabel('Number of layers')
plt.ylabel('Latitude')
plt.title('Correlation between the number of layers and the latitude')

print ("From the graph, there seems to a slight increasing correlation between the latitude and the number of azimuthal layers, so that craters closer to the north pole have a higher number of layers.")

correlation_layers_latitude = scipy.stats.pearsonr (data['LATITUDE_CIRCLE_IMAGE'], data['NUMBER_LAYERS'])
print(correlation_layers_latitude)
print ("However, when calculating the Pearson correlation coefficient, it highlights a slight but significant positive correlation, so craters towards the north pole tend to have higher number of layers.")
print(correlation_layers_latitude[0]**2)
print ("The RSquared of 0.0039638 indicates that only the 0.4% of the variance of the response variable is explained by the explanatory variable, thus again denoting a low reliability of the correlation results.")