Thursday, 30 January 2014

Downloading vidd.me: The fun of sequential file naming

Vidd.me launched recently, as a simple and effective way to share videos online. You upload your video file or gif, it converts it into mp4, and you get a link to the nice html 5 page it's displayed on. It seems to work rather nicely and the team are adding new features all the time without adding new limits.

Since it loads the raw mp4 file into the browser, getting the file is easy. Open up the source to any page and you have something like this:


And with our handy friend wget, you can download the full video from the commandline without any special tools at all.




So far it's business as usual. You could have used anything to download it, (including your browser. Things get a little more interesting when you look at the source of their 'new' or 'top videos' page:
 


There's three things to note here:

  • Each preview on the page has '-clip' added to the filename, to distinguish it from the main mp4 file
  • Every file seems to be stored on the same server using the same directory structure, d1wst0behutosd.cloudfront.net/videos/
  • The files appear to be named sequentially, with gaps for deleted or private files.
This presents a very simple way to find content. At the time I'm writing this they've got around 4200 videos, so let's pull down all the previews on their site in just a few lines:

#!/bin/bash
for i in {1..4200}
        do
                wget "https://d1wst0behutosd.cloudfront.net/videos/"$i"-clip.mp4"
        done
exit

A little while later.


If you feel like downloading everything on the site - full length - you just have to modify the script above ever-so-slightly, like so:

#!/bin/bash
for i in {1..4200}
        do
                wget "https://d1wst0behutosd.cloudfront.net/videos/"$i".mp4"
        done
exit


To take it further, It'd be pretty simple to set up a script that updates your mirror of the site: A cron job that grabs the 'latest videos' page, parses it for the highest video number, and then goes from the last downloaded video until that point (or just count up from the last known video until you hit too many failures in a row). As a side-effect of the one-way sync, you'd have a copy of any videos that were subsequently removed.

I have to point out that the last paragraph could violate the Terms of service, which has a few conditions against scraping and DDOSing you might fall foul of. While they don't seem to have yet, it's also likely they'll throttle heavy users

Instead of that, I'm more interested in picking and choosing based on what looks interesting from those previews I grabbed. Let's create a quick script that lets me specify an arbitrary number of videos (based on their number) and download them:

#!/bin/bash
if [ "$#" -eq 0 ]; then
    echo "no input arguments detected"
else
    args=("$@")
    for arg in "${args[@]}"; do
        wget -nv "https://d1wst0behutosd.cloudfront.net/videos/"$arg".mp4"
    done
fi
exit

It takes the video numbers as command-line arguments, so now I can do this:


 No need for browser extensions or custom software, you can now just grab any video you like with what you have installed.
As long as the privacy controls in place are solid and they put some throttling in place to stop people ripping the whole site every hour, there's nothing wrong with how they've set things up. The site is designed to be as open and accessible as possible, and right now they're doing that from the ground up.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.