what would be sweet is if I could download segments from a file using wget ... any ideas?
Well, dd can help.
Suppose we need to extract 889th byte to 1298th byte of GPLv2 document, located at http://www.gnu.org/licenses/gpl-2.0.txt
1. Create a file (gpl_partial.txt) containing 888 bytes (i.e., the portion to skip)
2. Download the rest of the file and save it to gpl_partial.txt. (Assuming the server supports Range header)
dd if=/dev/zero of=gpl_partial.txt bs=1 count=888
wget interactively shows the download byte count. You can stop the download (Ctrl+c) after required bytes have been downloaded.
wget -c http://www.gnu.org/licenses/gpl-2.0.txt -O gpl_partial.txt
3. Now extract the required segment from gpl_partial.txt and save it to gpl_segment.txt (889th byte to 1298th byte = 410 bytes)
dd if=gpl_partial.txt of=gpl_segment.txt bs=1 skip=888 count=410
If wget is not a requirement, you can use cURL to get rid of all the hassles. To achieve the above mentioned segment,
curl -o gpl_segment.txt -r 888-1297 http://www.gnu.org/licenses/gpl-2.0.txtOr, if you wanna do it in Python way, use httpheader Python module.