Scrapy is a scraping application that can extract data from a lot of platforms such as TikTok, Facebook, Instagram. by utilizing a microservice architecture, Scrapy stores this data in an external database for comprehensive analysis.
To make the session is open between client and server there is must be a request to server and response back to client(in the response there is cookies will have session id) so when we call the TiktokScraper(N) API this will go scrape N posts but while scraping (no response happen and also in client side there is no cookies with session id) and we want to pause the process then we want to call Pause API it must stop the process for specific user but it will be not know which user because there is no session is created until now because there is no response happen from the server
Our system consists of three projects, Scrapy, DataHandler and Management and two resources shared between projects.
Scrapy's project is to scrape the posts and extract its metadata and send the download link of the post to DataHandler.
Datahendler downloads the posts, throws its Queue and saves the downloaded posts and its metadata to the database.
Management app is to handle the scraping processes like start, stop, pause and resume and return all status to webApp that can present it to the client.
WebApp has the profile of the user and its data, so he can display his profile and his scraping history
Mahmmoud A. Mahdi (Supervisor, Innovator, Team Leader)
Ali Abdulsalam Elshorpagi (Back-End Developer)
Mohamed Said Ibrahim (Back-End Developer)
Ammar Gamal Goda (Back-End Developer)
Amr Sameh Mohamed (Frontend Developer)
Basma Aiman Yousef (AI Engineer)