Pymeta
Scrape the internet for files and EXIF data.
Installation
git clone https://github.com/m8r0wn/pymeta
cd pymeta
sudo python3 setup.py install
python3 -m pip install pymetadata
Usage
pymeta -d example.com -s all -csv
pymeta -d example.org -s bing
pymeta -dir my_files/
Flags
PyMeta v.1.0.4
-----------------------------------
Search the web for files on the targeted domain
and extract metadata.
optional arguments:
-h, --help show this help message and exit
Target Options:
-d DOMAIN Target domain
-dir FILE_DIR Pre-existing directory of files
Search Options:
-s {google,bing,all} Search engine(s) to scrape
-m MAX_RESULTS Max results per file type, per search engine (Default: 50)
-j JITTER Seconds between search requests (Default: 2)
Output Options:
-o OUTPUT_DIR Path to store PyMeta's download folder (Default: ./)
-f FILENAME Custom report path/name.csv
--debug Show links as they are collected during scraping
Examples
$ pymeta -d example.com
[*] Starting PyMeta web scraper
[*] Extension | Number of New Files Found | Search URL
[*] pdf : 50 https://www.google.com/search?q=site:example.com+filetype:pdf&num=100&start=0
[*] pdf : 6 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=0
[*] pdf : 7 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=43
[*] pdf : 9 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=82
[*] pdf : 9 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=120
[*] pdf : 4 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=157
[*] pdf : 7 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=195
[*] pdf : 7 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=233
[*] pdf : 0 http://www.bing.com/search?q=site:example.com%20filetype:pdf&first=270
[*] xls : 0 https://www.google.com/search?q=site:example.com+filetype:xls&num=100&start=0
[*] xls : 2 http://www.bing.com/search?q=site:example.com%20filetype:xls&first=0
[*] xls : 0 http://www.bing.com/search?q=site:example.com%20filetype:xls&first=32
[*] xlsx: 0 https://www.google.com/search?q=site:example.com+filetype:xlsx&num=100&start=0
[*] xlsx: 0 http://www.bing.com/search?q=site:example.com%20filetype:xlsx&first=0
[*] csv : 0 https://www.google.com/search?q=site:example.com+filetype:csv&num=100&start=0
[*] csv : 0 http://www.bing.com/search?q=site:example.com%20filetype:csv&first=0
[*] doc : 0 https://www.google.com/search?q=site:example.com+filetype:doc&num=100&start=0
[*] doc : 0 http://www.bing.com/search?q=site:example.com%20filetype:doc&first=0
[*] docx: 0 https://www.google.com/search?q=site:example.com+filetype:docx&num=100&start=0
[*] docx: 0 http://www.bing.com/search?q=site:example.com%20filetype:docx&first=0
[*] ppt : 0 https://www.google.com/search?q=site:example.com+filetype:ppt&num=100&start=0
[*] ppt : 0 http://www.bing.com/search?q=site:example.com%20filetype:ppt&first=0
[*] pptx: 0 https://www.google.com/search?q=site:example.com+filetype:pptx&num=100&start=0
[*] pptx: 0 http://www.bing.com/search?q=site:example.com%20filetype:pptx&first=0
[*] Downloading 101 files to: ./example_meta/
[*] Extracting Metadata...
[*] Adding source URL's to the report
[+] Report complete: example_meta.csv