Since it loads the raw mp4 file into the browser, getting the file is easy. Open up the source to any page and you have something like this:
And with our handy friend wget, you can download the full video from the commandline without any special tools at all.
So far it's business as usual. You could have used anything to download it, (including your browser. Things get a little more interesting when you look at the source of their 'new' or 'top videos' page:
There's three things to note here:
- Each preview on the page has '-clip' added to the filename, to distinguish it from the main mp4 file
- Every file seems to be stored on the same server using the same directory structure, d1wst0behutosd.cloudfront.net/videos/
- The files appear to be named sequentially, with gaps for deleted or private files.
This presents a very simple way to find content. At the time I'm writing this they've got around 4200 videos, so let's pull down all the previews on their site in just a few lines:
#!/bin/bash for i in {1..4200} do wget "https://d1wst0behutosd.cloudfront.net/videos/"$i"-clip.mp4" done exit
A little while later.
If you feel like downloading everything on the site - full length - you just have to modify the script above ever-so-slightly, like so:
#!/bin/bash for i in {1..4200} do wget "https://d1wst0behutosd.cloudfront.net/videos/"$i".mp4" done exit
To take it further, It'd be pretty simple to set up a script that updates your mirror of the site: A cron job that grabs the 'latest videos' page, parses it for the highest video number, and then goes from the last downloaded video until that point (or just count up from the last known video until you hit too many failures in a row). As a side-effect of the one-way sync, you'd have a copy of any videos that were subsequently removed.
I have to point out that the last paragraph could violate the Terms of service, which has a few conditions against scraping and DDOSing you might fall foul of. While they don't seem to have yet, it's also likely they'll throttle heavy users
Instead of that, I'm more interested in picking and choosing based on what looks interesting from those previews I grabbed. Let's create a quick script that lets me specify an arbitrary number of videos (based on their number) and download them:
#!/bin/bash if [ "$#" -eq 0 ]; then echo "no input arguments detected" else args=("$@") for arg in "${args[@]}"; do wget -nv "https://d1wst0behutosd.cloudfront.net/videos/"$arg".mp4" done fi exit
No need for browser extensions or custom software, you can now just grab any video you like with what you have installed.
As long as the privacy controls in place are solid and they put some throttling in place to stop people ripping the whole site every hour, there's nothing wrong with how they've set things up. The site is designed to be as open and accessible as possible, and right now they're doing that from the ground up.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.