Of course, Venmo is perfect for splitting up the bills from that crazy night out with your friends. Yet, it is easy to forget about the pitfall of the convenience at your fingertip—your privacy compromised.
In fact, all payments you make through Venmo are publicly accessible unless you make them specifically private. These public data can offer some interesting insights, such as what people (supposedly) buy and how much they spend. While I was searching for an adequate topic for my final project for an NYU class called Python for App this spring, a more interesting, perhaps mischievous, question came to my mind: “What kind of bad things do people buy through Venmo?”
Using web scraping tool will allow us to quickly collect the data without intensive coding involving the Venmo API.
First, we need to install PhantomJS and Selenium bindings for Python. I am quoting the code snippet from Hayton’s blog post.
Or, you can download PhantomJS and Selenium manually and place them in your virtual environment library. The below are the links:
Now, let’s see how this is done.
There are three tasks that need to be done in the following order:
- Scrape web elements
- Parse HTML elements
- Cleanse data
We create three classes to do these tasks–
LetMeAnalyzeThat. An additional class that instanciates these three classes is
VicemoScraper. The below is the code snippet that showcases how PhantomJS and Selenium are used.
First, the class,
LetMeScrapeThat, instantiates a PhantomJS headless browser by calling
webdriver.PhantomJS(). We then access the target website via the method
self.phantom_webpage.get(link). Note how the method
scrape_vicemo() allows us to scroll down in our PhantomJS browser to access more data, which are rendered dynamically as the user scrolls down the page. Finally, we extract the HTML codes from the web elements and store them in the list variable
Let’s now take a look at
Note that the description of the transaction is within
<div> tag with
class="description". Also, note how the emojis are represented by the attribute
title=emoji-name of the
Take a look at the code snippet for
LetMeParseThat() class below.
As observed, the description of the transactions comes in both strings and emojis, so we use
extract_emoji_data() methods to extract the strings and emojis from HTML accordingly.
Moreover, these data need to be cleansed before we can use them. The class
LetMeAnalyzeThat does this job.
Note most commonly used English words are hard coded to help the program filter out trivial words. Remember, we are only interested in what Venmo users are paying for. Using Regex, we get rid of special characters and whitespaces from words. The clenased words are then added to a compiler. A similar process is carried out to extract emoji data. The resulting outputs are dictionaries containing the objects or activities Venmo users paid for and how many times it appears in the collected data. Finally, we have all the tools to obtain the data.
The working version of the codes from this project is available on Github. Here is the link. There are many different ways to use this data. The below is an example of visualization that I created using Tableau Public with the data that were obtained on July 27th.