My Portfolio

Overview

Objective: Create a visual representation of the correlation matrix to highlight potential relationships between different risk tags.

Essential Terminology:

1. Smart Contract: A self-executing program stored on the blockchain that automatically carries out an agreement when certain conditions are met without needing a middleman.

- For Example: Let’s say you want to buy a digital artwork. A smart contract can be set up so that as soon as you send the payment, the artwork is automatically transferred to you. No need for a third party like PayPal or a bank.

- Smart contracts' use cases are ever-growing. They can even be used for something as simple as a pair of friends making a bet on which team will win a sports game.

- Why Are They Important?

No Middlemen: No banks, lawyers, or companies are needed to enforce the contract.
Transparent: Everyone can see how the contract works (no hidden rules).
Secure & Immutable: Once written on the blockchain, it can’t be changed or tampered with.

2. Risk Tag: Think of risk tags like warning labels on food—they alert you to potential dangers before you interact with a wallet or contract. These risk tags are provided by Webacy, a security platform that helps protect crypto and NFT assets. There are a total of 32 risk tags, each representing a different type of risk.

- What do they detect?

Risks of a contract taking your funds through scam, hacks, or fraud.
Hacking of your tokens.
Suspicious activity, such as involvement in shady transactions.
Fake contracts mimicking legitimate ones.

Steps:

Step 1: Import the necessary libraries and load the dataset

The libraries:
     - Pandas
     - Numpy
     - Matplotlib
     - Seaborn
     - Scipy
     - networkx

The dataset you will be using is a "compiled_risk_dataset". What exactly is in this dataset? This dataset contains 1094 entries of smart contract vulnerabilities. The first 3 columns contain essential information about the smart contract: the project name, the smart contract address, and the chain. The remaining columns are 32 potential risk tags that may be present in any given smart contract. The dataset is essentially a table that lists what specific risks are present in each contract.

Download the dataset and save it into a pandas dataframe. Print the first five rows using the .head() function.

Step 2: Create Function to Calculate Correlations

How will we calculate the correlations?
We will use the Phi coefficient, which is specifically designed for binary data. The Phi Coefficient is a measure of the association between two binary variables. To calculate the Phi coefficient, we first need to establish a function that can handle this calculation. Create a function that can compute correlations of 2 binary variables.

The following code creates a contingency table:

contingency_table = pd.crosstab(x, y)

The following code calculates the Phi coefficient:

chi2 = scipy.stats.chi2_contingency(contingency_table, correction=False)[0]

n = np.sum(np.sum(contingency_table))

phi = np.sqrt(chi2 / n)

Step 3: Calculate Correlations

First, we need to create an array called risk_columns that will contain all of our risk tags.
Load the array into a pandas dataframe.
Create a DataFrame to store Phi coefficients using the following code:

phi_matrix = pd.DataFrame(index=risk_df.columns, columns=risk_df.columns)

Use the following code to calculate Phi coefficient for each pair of binary variables:

for var1 in risk_df.columns:

    for var2 in risk_df.columns:

        phi_matrix.loc[var1, var2] = phi_coefficient(risk_df[var1], risk_df[var2])

Step 4: Visualization

Now it is time to visualize our analysis. Use the following code to create a heatmap:

plt.figure(figsize=(12, 10)) # Setting the size of the plot

sns.heatmap(phi_matrix.astype(float), annot=False, fmt=".2f", cmap='coolwarm', vmin=-1, vmax=1)

plt.title('Heatmap of Phi Coefficients Between Risk Tags')

plt.xticks(rotation=45, ha='right')

plt.show()

Your heatmap should look like the following:

Findings

Positive correlations:
1. buy_tax and sell_tax
2. encode_packed_parameters and encode_packed_collision
3. encode_packed_collision and is_airdrop_scam
4. illegal_unicode and is_airdrop_scam
5. anti_whale_modifiable and slippage_modifiable
6. is_false_token and is_airdrop_scam
7. shadowing_local and is_airdrop_scam
8. shadowing_local and encode_packed_collision

Deeper insights:

"is_blacklisted" has a weak correlation with "is_honeypot". This may suggest that blacklisting mechanisms are not commonly seen in contracts that try to trap users.
Since "is_honeypot" and "anti_whale_modifiable" are correlated, it may suggest that contracts implementing anti-whale mechanisms often include honeypot-like restrictions.
The strong correlation between "can_take_back_ownership" and "owner_change_balance" may indicate that contracts allowing ownership reversal often also have direct balance modification risks.
"centralized_risk_high", "centralized_risk_medium", and "centralized_risk_low" might show strong intra-group correlations, indicating that centralization-related vulnerabilities frequently co-occur.
"selfdestruct" and "trust_list" might correlate, meaning contracts using self-destruction mechanisms often implement allowlists. This can be a common issue among contracts that have both of these risk tags.

Correlation Analysis of Smart Contract Risks

Performed Using Python