Tuesday, January 13, 2009

Using wget and dd to download file segment

Prasanna's this tweet caught my attention:

what would be sweet is if I could download segments from a file using wget ... any ideas?

Well, dd can help.

Suppose we need to extract 889th byte to 1298th byte of GPLv2 document, located at

1. Create a file (gpl_partial.txt) containing 888 bytes (i.e., the portion to skip)

dd if=/dev/zero of=gpl_partial.txt bs=1 count=888
2. Download the rest of the file and save it to gpl_partial.txt. (Assuming the server supports Range header)

wget -c -O gpl_partial.txt
wget interactively shows the download byte count. You can stop the download (Ctrl+c) after required bytes have been downloaded.

3. Now extract the required segment from gpl_partial.txt and save it to gpl_segment.txt (889th byte to 1298th byte = 410 bytes)

dd if=gpl_partial.txt of=gpl_segment.txt bs=1 skip=888 count=410

If wget is not a requirement, you can use cURL to get rid of all the hassles. To achieve the above mentioned segment,
curl -o gpl_segment.txt -r 888-1297
Or, if you wanna do it in Python way, use httpheader Python module.

1 comment:

  1. la I'm tired of scripting ... tell him to devise me a script that issues commands .. based on my requirements on no of download blocks.. and total size of download