Overview
Objective:
Document the occurrence rates of each risk tag across the dataset, providing a clear quantification of the most prevalent vulnerabilities.
Essential Terminology:
1. Smart Contract: A self-executing program stored on the blockchain that automatically carries out an agreement when certain conditions are met without needing a middleman.
- For Example: Let’s say you want to buy a digital artwork. A smart contract can be set up so that as soon as you send the payment, the artwork is automatically transferred to you. No need for a third party like PayPal or a bank.
- Smart contracts' use cases are ever-growing. They can even be used for something as simple as a pair of friends making a bet on which team will win a sports game.
- Why Are They Important?
- No Middlemen: No banks, lawyers, or companies are needed to enforce the contract.
- Transparent: Everyone can see how the contract works (no hidden rules).
- Secure & Immutable: Once written on the blockchain, it can’t be changed or tampered with.
2. Risk Tag: Think of risk tags like warning labels on food—they alert you to potential dangers before you interact with a wallet or contract. These risk tags are provided by Webacy, a security platform that helps protect crypto and NFT assets. There are a total of 32 risk tags, each representing a different type of risk.
- What do they detect?
- Risks of a contract taking your funds through scam, hacks, or fraud.
- Hacking of your tokens.
- Suspicious activity, such as involvement in shady transactions.
- Fake contracts mimicking legitimate ones.
Steps:
Frequency Analysis in Excel
Step 1: Prepare for analysis
- To make the data easier to work with you can freeze the top row so you can always see your column headers as you scroll down. This is particularly useful with large datasets. (Go to View > Freeze Panes > Freeze Top Row.)
- Create a new worksheet dedicated to the summary table. This helps keep your analysis organized and separate from the raw data.
- In column A of the new worksheet, list all the risk tags you are analyzing. Start from A2, leaving A1 for the header "Risk Tag".
Step 2: Calculating the Frequencies and Percentages
- Set Up Frequency Calculation
In column B next to each risk tag, calculate the frequency of True values. Label column B as "Frequency of True". For example, if the risk tag 'Is_honeypot' is in column B of a worksheet named 'Data', in B2 you would enter:
=COUNTIF(Data!B:B, TRUE)
- Calculate Total Number of Entries
Determine the total number of entries in your dataset to use in calculating percentages. Place this formula in a cell in the summary worksheet, for example in B1 (consider renaming it to a more descriptive cell name for clarity):
=COUNTA(Data!B2:B101) // Adjust the range as necessary
Ensure this range matches the total data entries (excluding headers).
- Calculate the Percentage of True Values
In column C next to the frequency count, calculate the percentage that each frequency represents relative to the total number of entries. Label column C as "Percentage of True". In C2, you would enter:
=B2 / $B$1
Format the cells in Column C to percentage format for better readability.
- Drag to Fill the Formulas
After entering the formulas in B2 and C2 for the first risk tag, use the fill handle to drag the formula down through the column to automatically adjust it for the remaining risk tags. This action copies the formula while adjusting the row references accordingly but keeps the reference to the total number of entries absolute (using $ signs).
Step 3: Visualization
- Highlight all of your frequency data including the headers.
- Insert a Bar Chart: Go to Insert > Charts > Bar Chart and choose the first simple bar chart option. Excel will generate a bar chart displaying the frequency of True values for each risk tag.

Step 4: Findings
What are the most frequently seen risk tags?
1. Exploitation (43%)
2. Bad_contract (34%)
3. external_dependencies (29%)
What are the least frequently seen risk tags?
1. illegal_unicode (6%)
2. is_airdrop_scam (6%)
3. is_blacklisted (7%)
Frequency Analysis in Python
Step 1: Import the necessary libraries and load the dataset
The libraries:
- Pandas
- Matplotlib
- Seaborn
Download the dataset and save it into a pandas dataframe. Print the first five rows using the .head()
function.
Step 2: Calculating Frequencies
- First, we need to create an array called risk_columns that will contain all of our risk tags.
- We'll now use pandas to count how many True values appear in each column which represents our risk tags:
frequencies = data[risk_columns].apply(lambda x: x.value_counts()).loc[True]
frequencies = frequencies.fillna(0) # Replace NaN with 0 for any column that may not have True values
Step 3: Creating Advanced Visualizations
Just like in Excel, for frequency analysis a simple bar chart is enough.
sns.set_style("whitegrid") # set a style for your chart
plt.figure(figsize=(12, 8))
sns.barplot(x=frequencies.index, y=frequencies.values, palette='viridis')
plt.title('Frequency of True Values for Each Risk Tag')
plt.xlabel('Risk Tags')
plt.ylabel('Frequency of True')
plt.xticks(rotation=90)
plt.show()

Step 4: Findings
What are the most frequently seen risk tags?
1. Exploitation
2. Bad_contract
3. external_dependencies
What are the least frequently seen risk tags?
1. illegal_unicode
2. is_airdrop_scam
3. is_blacklisted