-
Notifications
You must be signed in to change notification settings - Fork 14
Refactor the UTD Comet Calendar Scraper #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
I was poking around since it seems this only scrapes events from https://calendar.utdallas.edu/calendar/1, which is less than a days worth of events, and check this out: https://calendar.utdallas.edu/api/2/events?days=365&pp=100&page=1 Seems like this returns all the events, we'd just need to loop through all the pages. |
@TyHil, No problem! I would update the code so that it loop through the pages. How many pages would you like to scrape from before the events started getting irrelevant time-wise? |
Awesome! I think a years work of data is good, since that's in line with the Mazevo scraper. With the api url I provided it automatically stops after 365 days and returns an empty array. Also in |
I just visited https://calendar.utdallas.edu/calendar, and realized if we traverse through from 1 to 54 in the pagination bar, it will show all the events til next year, so I'm inclined to do that. But would I need to take advantage of the url you provided for the scraping process or is it just for reference? |
That would certainly work and you wouldn't need to use the url I provided. But I think the slowness of scraping each page and event with chromedp might make just calling the API a better choice. Then all our scraper really needs to do is make 12 API calls and save them to a file. |
I would certainly go for your approach of calling the API. What I will do is still adding the looping in the existing function, and writing another function in And in case you agree, would you want to me to fetch the data in the exact same struct as in scraping, or it would have to change a bit? |
Sounds great! I don't think the data has to change, that'll be a job for the parser. |
Related to #57,
In addition to changing the selector's tag so that there wouldn't be any hangs, I've also updated the code so that it can gets as much information as possible (the description, location, email, etc).
Please let me know if the change will benefit the data for the API or I need to revert those changes.
Thank you!