DarkWeb Crawling using Focused and Classified Algorithm
DOI:
https://doi.org/10.25124/cepat.v1i02.4879Keywords:
Focused Crawling, Dark Web, TOR, URLAbstract
At this moment there are more and more cases of illegal goods transactions and personal data being leaked. Illegal transactions and personal sales data are usually carried out on the deep web, especially dark web because the web has multiple layers of encryption and an anonymous system when accessing it. without any illegal transactions and personal sales data, basically the web is very wide and deep. Therefore, the crawling method can be used to explore the dark web. The crawling method on the dark web can use a crawl focus that takes a focused approach on a particular topic. The focus crawling method takes a URL approach by looking at URL that are interconnected with the main URL page on the desired topic. To do focus crawling, it is done by entering keywords that best match the desired topic. With the focus crawling method, it is hoped that the maximum URL data set related to a particular topic can be generated. From the results obtained on the crawling system on the dark web, it is hoped that it can also be used to find out the number of URLs related to certain topics. In addition, the results of this crawl can also be a source of information for further research on the dark web.
Downloads
References
Kumar, M., Bindal, A., Gautam, R., & Bhatia, R. (2018). Keyword query based focused Web crawler. Procedia Computer Science, 125, 584-590.
Fang, T., Han, T., Zhang, C., & Yao, Y. J. (2020). Research and construction of the online pesticide information center and discovery platform based on web crawler. Procedia Computer Science, 166, 9-14.
Khazaie, A., Seghouani, N. B., & Bugiotti, F. (2021). Smart Crawling: A New Approach toward Focus Crawling from Twitter. arXiv preprint arXiv:2110.06022.
Gupta, A., Singh, K. B., & Singh, R. K. (2021). Web Crawling Techniques and Its Implications. Globus An International Journal of Management & IT, 9(7).
S. R. Mani Sekhar, G. M. Siddesh, S. S. Manvi dan K. G. Srinivasa, “Optimized focused Web Crawler with Natural Language Processing based relevance measure in bioinformatics web sources,” Cybernetics and Information Technologies, vol. 19, no. 2, pp. 146-158, 2019.
Vandana Shrivastava, " A Methodical Study of Web Crawler," Journal of Engineering Research and Application, Vol. 8, pp 01-08, Nov 2018, DOI: 10.9790/9622-0811010108.
P. Mishra, A. Khurana,” Accuracy Crawler: An Accurate Crawler for Deep Web Data Extraction,” in Proc International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT), 2018, pp. 25-29, doi: 10.1109/ICCPCCT.2018.8574286.
B. AlKhatib, R. Basheer,” Crawling the Dark Web: A Conceptual Perspective, Challenges and Implementation,” Journal of Digital Information Management., vol. 17, pp 51-60, April. 2019, DOI: 10.6025/jdim/2019/17/2/51-60.
Paris Koloveas,Thanasis Chantzios, Christos Tryfonopoulos, Spiros Skiadopoulos,” A crawler architecture for harvesting the clear, social, and dark web for IoT-related cyber-threat intelligence”, IEEE World Congress on Services (SERVICES), 2019, DOI 10.1109/SERVICES.2019.00016.
Dian S. Santoso, R.V. Hari Ginardi,” Kompresi Multilevel pada Metaheuristic Focused Web Crawler”, JUTI: Jurnal Ilmiah Teknologi Informasi, vol. 17, no.1, pp 52-53, Januari. 2019.
K. Velkumar, P. Thendral, “ Web Crawler and Web Crawler Algorithms: A Perspective,” International Journal of Engineering and Advanced Technology (IJEAT), vol. 9, pp 203-205, June 2020, DOI: 10.35940/ijeat.E9362.069520.
Downloads
Published
Issue
Section
License
CEPAT has chosen to apply the Creative Commons Attribution NonCommercial 4.0 License (CC BY-NC 4.0) to all manuscripts to be published. Authors who publish with this journal agree to the following terms.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.