DarkWeb Crawling using Focused and Classified Algorithm

  • Putri Rahmasari Yunelfi Telkom University
  • Yudha Purwanto
  • Muhammad Faris Ruriawan
  • Agus Setiawan Popalia
  • Fina Fahrani

Abstract

At this moment there are more and more cases of illegal goods transactions and personal data being leaked. Illegal transactions and personal sales data are usually carried out on the deep web, especially dark web because the web has multiple layers of encryption and an anonymous system when accessing it. without any illegal transactions and personal sales data, basically the web is very wide and deep. Therefore, the crawling method can be used to explore the dark web. The crawling method on the dark web can use a crawl focus that takes a focused approach on a particular topic. The focus crawling method takes a URL approach by looking at URL that are interconnected with the main URL page on the desired topic. To do focus crawling, it is done by entering keywords that best match the desired topic. With the focus crawling method, it is hoped that the maximum URL data set related to a particular topic can be generated. From the results obtained on the crawling system on the dark web, it is hoped that it can also be used to find out the number of URLs related to certain topics. In addition, the results of this crawl can also be a source of information for further research on the dark web.

Downloads

Download data is not yet available.

References

[1] Kumar, M., Bindal, A., Gautam, R., & Bhatia, R. (2018). Keyword query based focused Web crawler. Procedia Computer Science, 125, 584-590.
[2] Fang, T., Han, T., Zhang, C., & Yao, Y. J. (2020). Research and construction of the online pesticide information center and discovery platform based on web crawler. Procedia Computer Science, 166, 9-14.
[3] Khazaie, A., Seghouani, N. B., & Bugiotti, F. (2021). Smart Crawling: A New Approach toward Focus Crawling from Twitter. arXiv preprint arXiv:2110.06022.
[4] Gupta, A., Singh, K. B., & Singh, R. K. (2021). Web Crawling Techniques and Its Implications. Globus An International Journal of Management & IT, 9(7).
[5] S. R. Mani Sekhar, G. M. Siddesh, S. S. Manvi dan K. G. Srinivasa, “Optimized focused Web Crawler with Natural Language Processing based relevance measure in bioinformatics web sources,” Cybernetics and Information Technologies, vol. 19, no. 2, pp. 146-158, 2019.
[6] Vandana Shrivastava, " A Methodical Study of Web Crawler," Journal of Engineering Research and Application, Vol. 8, pp 01-08, Nov 2018, DOI: 10.9790/9622-0811010108.
[7] P. Mishra, A. Khurana,” Accuracy Crawler: An Accurate Crawler for Deep Web Data Extraction,” in Proc International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT), 2018, pp. 25-29, doi: 10.1109/ICCPCCT.2018.8574286.
[8] B. AlKhatib, R. Basheer,” Crawling the Dark Web: A Conceptual Perspective, Challenges and Implementation,” Journal of Digital Information Management., vol. 17, pp 51-60, April. 2019, DOI: 10.6025/jdim/2019/17/2/51-60.
[9] Paris Koloveas,Thanasis Chantzios, Christos Tryfonopoulos, Spiros Skiadopoulos,” A crawler architecture for harvesting the clear, social, and dark web for IoT-related cyber-threat intelligence”, IEEE World Congress on Services (SERVICES), 2019, DOI 10.1109/SERVICES.2019.00016.
[10] Dian S. Santoso, R.V. Hari Ginardi,” Kompresi Multilevel pada Metaheuristic Focused Web Crawler”, JUTI: Jurnal Ilmiah Teknologi Informasi, vol. 17, no.1, pp 52-53, Januari. 2019.
[11] K. Velkumar, P. Thendral, “ Web Crawler and Web Crawler Algorithms: A Perspective,” International Journal of Engineering and Advanced Technology (IJEAT), vol. 9, pp 203-205, June 2020, DOI: 10.35940/ijeat.E9362.069520.
Published
2022-08-30
How to Cite
YUNELFI, Putri Rahmasari et al. DarkWeb Crawling using Focused and Classified Algorithm. [CEPAT] Journal of Computer Engineering: Progress, Application and Technology, [S.l.], v. 1, n. 02, p. 1-6, aug. 2022. ISSN 2963-6728. Available at: <//journals.telkomuniversity.ac.id/cepat/article/view/4879>. Date accessed: 03 may 2024. doi: https://doi.org/10.25124/cepat.v1i02.4879.
Section
Articles