Bulk_extractor
Program that extracts features such as email addresses, credit card numbers, URLs, and other types of information from digital evidence files.
Installation
Download newest release from Digitalcorpora.org
Usage
bulk_extractor [options] imagefile
Flags
Required parameters:
imagefile - the file to extract
or -R filedir - recurse through a directory of files
HAS SUPPORT FOR E01 FILES
HAS SUPPORT FOR AFF FILES
-o outdir - specifies output directory. Must not exist.
bulk_extractor creates this directory.
Options:
-i - INFO mode. Do a quick random sample and print a report.
-b banner.txt- Add banner.txt contents to the top of every output file.
-r alert_list.txt - a file containing the alert list of features to alert
(can be a feature file or a list of globs)
(can be repeated.)
-w stop_list.txt - a file containing the stop list of features (white list
(can be a feature file or a list of globs)s
(can be repeated.)
-F <rfile> - Read a list of regular expressions from <rfile> to find
-f <regex> - find occurrences of <regex>; may be repeated.
results go into find.txt
-q nn - Quiet Rate; only print every nn status reports. Default 0; -1 for no status at all
-s frac[:passes] - Set random sampling parameters
Tuning parameters:
-C NN - specifies the size of the context window (default 16)
-S fr:<name>:window=NN specifies context window for recorder to NN
-S fr:<name>:window_before=NN specifies context window before to NN for recorder
-S fr:<name>:window_after=NN specifies context window after to NN for recorder
-G NN - specify the page size (default 16777216)
-g NN - specify margin (default 4194304)
-j NN - Number of analysis threads to run (default 4)
-M nn - sets max recursion depth (default 7)
-m <max> - maximum number of minutes to wait after all data read
default is 60
Path Processing Mode:
-p <path>/f - print the value of <path> with a given format.
formats: r = raw; h = hex.
Specify -p - for interactive mode.
Specify -p -http for HTTP mode.
Parallelizing:
-Y <o1> - Start processing at o1 (o1 may be 1, 1K, 1M or 1G)
-Y <o1>-<o2> - Process o1-o2
-A <off> - Add <off> to all reported feature offsets
Debugging:
-h - print this message
-H - print detailed info on the scanners
-V - print version number
-z nn - start on page nn
-dN - debug mode (see source code)
-Z - zap (erase) output directory
Control of Scanners:
-P <dir> - Specifies a plugin directory
Default dirs include /usr/local/lib/bulk_extractor /usr/lib/bulk_extractor and
BE_PATH environment variable
-e <scanner> enables <scanner> -- -e all enables all
-x <scanner> disable <scanner> -- -x all disables all
-E <scanner> - turn off all scanners except <scanner>
(Same as -x all -e <scanner>)
note: -e, -x and -E commands are executed in order
e.g.: '-E gzip -e facebook' runs only gzip and facebook
-S name=value - sets a bulk extractor option name to be value
Settable Options (and their defaults):
-S work_start_work_end=YES Record work start and end of each scanner in report.xml file ()
-S enable_histograms=YES Disable generation of histograms ()
-S debug_histogram_malloc_fail_frequency=0 Set >0 to make histogram maker fail with memory allocations ()
-S hash_alg=md5 Specifies hash algorithm to be used for all hash calculations ()
-S dup_data_alerts=NO Notify when duplicate data is not processed ()
-S write_feature_files=YES Write features to flat files ()
-S write_feature_sqlite3=NO Write feature files to report.sqlite3 ()
-S report_read_errors=YES Report read errors ()
-S carve_net_memory=NO Carve network memory structures (net)
-S word_min=6 Minimum word size (wordlist)
-S word_max=14 Maximum word size (wordlist)
-S max_word_outfile_size=100000000 Maximum size of the words output file (wordlist)
-S wordlist_use_flatfiles=YES Override SQL settings and use flatfiles for wordlist (wordlist)
-S ssn_mode=0 0=Normal; 1=No `SSN' required; 2=No dashes required (accts)
-S min_phone_digits=7 Min. digits required in a phone (accts)
-S exif_debug=0 debug exif decoder (exif)
-S jpeg_carve_mode=1 0=carve none; 1=carve encoded; 2=carve all (exif)
-S min_jpeg_size=1000 Smallest JPEG stream that will be carved (exif)
-S zip_min_uncompr_size=6 Minimum size of a ZIP uncompressed object (zip)
-S zip_max_uncompr_size=268435456 Maximum size of a ZIP uncompressed object (zip)
-S zip_name_len_max=1024 Maximum name of a ZIP component filename (zip)
-S unzip_carve_mode=1 0=carve none; 1=carve encoded; 2=carve all (zip)
-S rar_find_components=YES Search for RAR components (rar)
-S rar_find_volumes=YES Search for RAR volumes (rar)
-S unrar_carve_mode=1 0=carve none; 1=carve encoded; 2=carve all (rar)
-S gzip_max_uncompr_size=268435456 maximum size for decompressing GZIP objects (gzip)
-S pdf_dump=NO Dump the contents of PDF buffers (pdf)
-S pdf_dump=NO Dump the contents of PDF buffers (msxml)
-S winpe_carve_mode=1 0=carve none; 1=carve encoded; 2=carve all (winpe)
-S opt_weird_file_size=157286400 Threshold for FAT32 scanner (windirs)
-S opt_weird_file_size2=536870912 Threshold for FAT32 scanner (windirs)
-S opt_weird_cluster_count=67108864 Threshold for FAT32 scanner (windirs)
-S opt_weird_cluster_count2=268435456 Threshold for FAT32 scanner (windirs)
-S opt_max_bits_in_attrib=3 Ignore FAT32 entries with more attributes set than this (windirs)
-S opt_max_weird_count=2 Number of 'weird' counts to ignore a FAT32 entry (windirs)
-S opt_last_year=2023 Ignore FAT32 entries with a later year than this (windirs)
-S xor_mask=255 XOR mask value, in decimal (xor)
-S sqlite_carve_mode=2 0=carve none; 1=carve encoded; 2=carve all (sqlite)
These scanners disabled by default; enable with -e:
-e base16 - enable scanner base16
-e facebook - enable scanner facebook
-e outlook - enable scanner outlook
-e sceadan - enable scanner sceadan
-e wordlist - enable scanner wordlist
-e xor - enable scanner xor
These scanners enabled by default; disable with -x:
-x accts - disable scanner accts
-x aes - disable scanner aes
-x base64 - disable scanner base64
-x elf - disable scanner elf
-x email - disable scanner email
-x exif - disable scanner exif
-x find - disable scanner find
-x gps - disable scanner gps
-x gzip - disable scanner gzip
-x hiberfile - disable scanner hiberfile
-x httplogs - disable scanner httplogs
-x json - disable scanner json
-x kml - disable scanner kml
-x msxml - disable scanner msxml
-x net - disable scanner net
-x pdf - disable scanner pdf
-x rar - disable scanner rar
-x sqlite - disable scanner sqlite
-x vcard - disable scanner vcard
-x windirs - disable scanner windirs
-x winlnk - disable scanner winlnk
-x winpe - disable scanner winpe
-x winprefetch - disable scanner winprefetch
-x zip - disable scanner zip
Examples
bulk_extractor -o bulk-out xp-laptop-2005-07-04-1430.img
bulk_extractor version: 1.3
Hostname: kali
Input file: xp-laptop-2005-07-04-1430.img
Output directory: bulk-out
Disk Size: 536715264
Threads: 1
Phase 1.
13:02:46 Offset 0MB (0.00%) Done in n/a at 13:02:45
13:03:39 Offset 67MB (12.50%) Done in 0:06:14 at 13:09:53
13:04:43 Offset 134MB (25.01%) Done in 0:05:50 at 13:10:33
13:04:55 Offset 201MB (37.51%) Done in 0:03:36 at 13:08:31
13:06:01 Offset 268MB (50.01%) Done in 0:03:15 at 13:09:16
13:06:48 Offset 335MB (62.52%) Done in 0:02:25 at 13:09:13
13:07:04 Offset 402MB (75.02%) Done in 0:01:25 at 13:08:29
13:07:20 Offset 469MB (87.53%) Done in 0:00:39 at 13:07:59
All Data is Read; waiting for threads to finish...
Time elapsed waiting for 1 thread to finish:
(please wait for another 60 min .)
Time elapsed waiting for 1 thread to finish:
6 sec (please wait for another 59 min 54 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
12 sec (please wait for another 59 min 48 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
18 sec (please wait for another 59 min 42 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
24 sec (please wait for another 59 min 36 sec.)
Thread 0: Processing 520093696
Time elapsed waiting for 1 thread to finish:
30 sec (please wait for another 59 min 30 sec.)
Thread 0: Processing 520093696
All Threads Finished!
Producer time spent waiting: 335.984 sec.
Average consumer time spent waiting: 0.143353 sec.
*******************************************
** bulk_extractor is probably CPU bound. **
** Run on a computer with more cores **
** to get better performance. **
*******************************************
Phase 2. Shutting down scanners
Phase 3. Creating Histograms
ccn histogram... ccn_track2 histogram... domain histogram...
email histogram... ether histogram... find histogram...
ip histogram... tcp histogram... telephone histogram...
url histogram... url microsoft-live... url services...
url facebook-address... url facebook-id... url searches...
Elapsed time: 378.5 sec.
Overall performance: 1.418 MBytes/sec.
Total email features found: 899