Katana
Katana is a fast crawler focused on execution in automation pipelines offering both headless and non-headless crawling.
Features
- Fast And fully configurable web crawling
- Standard and Headless mode support
- JavaScript parsing / crawling
- Customizable automatic form filling
- Scope control - Preconfigured field / Regex
- Customizable output - Preconfigured fields
- INPUT - STDIN, URL and LIST
- OUTPUT - STDOUT, FILE and JSON
Installation
go install github.com/projectdiscovery/katana/cmd/katana@latest
Usage
katana [flags]
Flags
INPUT:
-u, -list string[] target url / list to crawl
-resume string resume scan using resume.cfg
CONFIGURATION:
-r, -resolvers string[] list of custom resolver (file or comma separated)
-d, -depth int maximum depth to crawl (default 3)
-jc, -js-crawl enable endpoint parsing / crawling in javascript file
-jsl, -jsluice enable jsluice parsing in javascript file (memory intensive)
-ct, -crawl-duration value maximum duration to crawl the target for (s, m, h, d) (default s)
-kf, -known-files string enable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required to ensure all known files are properly crawled.
-mrs, -max-response-size int maximum response size to read (default 9223372036854775807)
-timeout int time to wait for request in seconds (default 10)
-aff, -automatic-form-fill enable automatic form filling (experimental)
-fx, -form-extraction extract form, input, textarea & select elements in jsonl output
-retry int number of times to retry the request (default 1)
-proxy string http/socks5 proxy to use
-H, -headers string[] custom header/cookie to include in all http request in header:value format (file)
-config string path to the katana configuration file
-fc, -form-config string path to custom form configuration file
-flc, -field-config string path to custom field configuration file
-s, -strategy string Visit strategy (depth-first, breadth-first) (default "depth-first")
-iqp, -ignore-query-params Ignore crawling same path with different query-param values
-tlsi, -tls-impersonate enable experimental client hello (ja3) tls randomization
-dr, -disable-redirects disable following redirects (default false)
DEBUG:
-health-check, -hc run diagnostic check up
-elog, -error-log string file to write sent requests error log
HEADLESS:
-hl, -headless enable headless hybrid crawling (experimental)
-sc, -system-chrome use local installed chrome browser instead of katana installed
-sb, -show-browser show the browser on the screen with headless mode
-ho, -headless-options string[] start headless chrome with additional options
-nos, -no-sandbox start headless chrome in --no-sandbox mode
-cdd, -chrome-data-dir string path to store chrome browser data
-scp, -system-chrome-path string use specified chrome browser for headless crawling
-noi, -no-incognito start headless chrome without incognito mode
-cwu, -chrome-ws-url string use chrome browser instance launched elsewhere with the debugger listening at this URL
-xhr, -xhr-extraction extract xhr request url,method in jsonl output
SCOPE:
-cs, -crawl-scope string[] in scope url regex to be followed by crawler
-cos, -crawl-out-scope string[] out of scope url regex to be excluded by crawler
-fs, -field-scope string pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn")
-ns, -no-scope disables host based default scope
-do, -display-out-scope display external endpoint from scoped crawling
FILTER:
-mr, -match-regex string[] regex or list of regex to match on output url (cli, file)
-fr, -filter-regex string[] regex or list of regex to filter on output url (cli, file)
-f, -field string field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)
-mdc, -match-condition string match response with dsl based condition
-fdc, -filter-condition string filter response with dsl based condition
RATE-LIMIT:
-c, -concurrency int number of concurrent fetchers to use (default 10)
-p, -parallelism int number of concurrent inputs to process (default 10)
-rd, -delay int request delay between each request in seconds
-rl, -rate-limit int maximum requests to send per second (default 150)
-rlm, -rate-limit-minute int maximum number of requests to send per minute
UPDATE:
-up, -update update katana to latest version
-duc, -disable-update-check disable automatic katana update check
OUTPUT:
-o, -output string file to write output to
-sr, -store-response store http requests/responses
-srd, -store-response-dir string store http requests/responses to custom directory
-or, -omit-raw omit raw requests/responses from jsonl output
-ob, -omit-body omit response body from jsonl output
-j, -jsonl write output in jsonl format
-nc, -no-color disable output content coloring (ANSI escape codes)
-silent display output only
-v, -verbose display verbose output
-debug display debug output
-version display project version
Examples
$ katana -u https://tesla.com -headless -no-incognito
__ __
/ /_____ _/ /____ ____ ___ _
/ '_/ _ / __/ _ / _ \/ _ /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/
projectdiscovery.io
[INF] Current katana version v1.0.5 (latest)
[INF] Started headless crawling for => https://tesla.com
https://www.tesla.com/akam/13/38f585b7
https://www.tesla.com/modules/custom/tesla_banners/js/index.js?v=1.x
https://www.tesla.com/_flysystem/s3/js/js_tM-PUnZsSWlsw4SGqbE_P0jQgnhzr5-k0JyrcXI7GiM.js
https://www.tesla.com/_flysystem/s3/js/js_ziQ6U_MBKDHsNt0vkIi8tBQ6eY2Bvs5fTiK6f1FgQMk.js
https://digitalassets.tesla.com/tesla-contents/image/upload/h_1800,w_2880,c_fit,f_auto,q_auto:best/Homepage-SolarRoof-Desktop-Global
https://digitalassets.tesla.com/tesla-contents/image/upload/f_auto,q_auto/Homepage-SolarPanels-01-Desktop
https://digitalassets.tesla.com/tesla-contents/image/upload/h_1800,w_2880,c_fit,f_auto,q_auto:best/Homepage-Model-X-Desktop-LHD
https://digitalassets.tesla.com/tesla-contents/image/upload/h_2400,w_2880,c_fit,f_auto,q_auto:best/Homepage-Model-Y-Global-Desktop
https://www.tesla.com/_flysystem/s3/js/js_0oDCLl1W4pXlevbCx0draTXjFWLA0w7glUGpeJIhMpk.js
https://www.tesla.com/themes/custom/tesla_frontend/assets/manifest.json
https://www.tesla.com/_flysystem/s3/css/css_JlDnfCdMVObK_ykl43bGoMeh6xocjbxc_mIBK3JTtkc.css
https://www.tesla.com/_flysystem/s3/css/css_eU4gcaMHzV0EaLh7hWtE-uFgeUZJL0jOoElc76Ko3Z0.css
https://www.tesla.com/_flysystem/s3/css/css_9JG-GCPRlzqh3qNEz0zqzkCbGmdrVzYp4nHXbLOHUVQ.css
https://tesla.com
https://www.tesla.com/
https://digitalassets.tesla.com/raw/upload/emea-market-assets/prod/%5C%27%7BCOOKIE_SETTINGS_URL%7D%5C%27
https://www.tesla.com/en_my
https://www.tesla.com/hr_hr
https://www.tesla.com/he_il
https://www.tesla.com/en_gb
https://www.tesla.com/en_ae
https://www.tesla.com/ar_ae
https://www.tesla.com/tr_tr
https://www.tesla.com/th_th
https://www.tesla.com/en_th
...