Refactor the UTD Comet Calendar Scraper #68

mikehquan19 · 2025-04-21T07:36:41Z

Related to #57,
In addition to changing the selector's tag so that there wouldn't be any hangs, I've also updated the code so that it can gets as much information as possible (the description, location, email, etc).

Please let me know if the change will benefit the data for the API or I need to revert those changes.
Thank you!

TyHil · 2025-04-26T01:42:30Z

I was poking around since it seems this only scrapes events from https://calendar.utdallas.edu/calendar/1, which is less than a days worth of events, and check this out:

https://calendar.utdallas.edu/api/2/events?days=365&pp=100&page=1
https://calendar.utdallas.edu/api/2/events?days=365&pp=100&page=2
https://help.concept3d.com/hc/en-us/articles/11940613344915-Localist-API

Seems like this returns all the events, we'd just need to loop through all the pages.

mikehquan19 · 2025-04-26T01:50:49Z

@TyHil, No problem! I would update the code so that it loop through the pages. How many pages would you like to scrape from before the events started getting irrelevant time-wise?

TyHil · 2025-04-26T01:56:44Z

Awesome! I think a years work of data is good, since that's in line with the Mazevo scraper. With the api url I provided it automatically stops after 365 days and returns an empty array. Also in page.total of the result it says how many pages are available, 12 in this case.

mikehquan19 · 2025-04-26T02:47:55Z

I just visited https://calendar.utdallas.edu/calendar, and realized if we traverse through from 1 to 54 in the pagination bar, it will show all the events til next year, so I'm inclined to do that. But would I need to take advantage of the url you provided for the scraping process or is it just for reference?

TyHil · 2025-04-26T02:53:49Z

That would certainly work and you wouldn't need to use the url I provided. But I think the slowness of scraping each page and event with chromedp might make just calling the API a better choice. Then all our scraper really needs to do is make 12 API calls and save them to a file.

mikehquan19 · 2025-04-26T03:06:18Z

I would certainly go for your approach of calling the API. What I will do is still adding the looping in the existing function, and writing another function in calendar.go that calls the API to get the data. Later on, we can get rid of the scraping function when we are confident about the API calling. Would that work?

And in case you agree, would you want to me to fetch the data in the exact same struct as in scraping, or it would have to change a bit?

TyHil · 2025-04-26T03:09:40Z

Sounds great!

I don't think the data has to change, that'll be a job for the parser.

mikehquan19 and others added 9 commits March 13, 2025 19:54

Add unit tests for validator

cec6ca0

Add unit test for validator

627137d

Merge branch 'UTDNebula:develop' into develop

48193ee

Merge branch 'develop' into develop

d2829ef

Merge branch 'UTDNebula:develop' into develop

8f783a8

Refactor calendar scraper

f1a2f92

Just revert the previous erroneous commit

27bcf63

Some minor fixes and comments for readability

616ead8

Adjust the validation test a bit for readability

d5f4b6e

TyHil linked an issue May 2, 2025 that may be closed by this pull request

Refactor UTD Comet Calendar Scraper #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor the UTD Comet Calendar Scraper #68

Refactor the UTD Comet Calendar Scraper #68

Uh oh!

mikehquan19 commented Apr 21, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

Uh oh!

Refactor the UTD Comet Calendar Scraper #68

Are you sure you want to change the base?

Refactor the UTD Comet Calendar Scraper #68

Uh oh!

Conversation

mikehquan19 commented Apr 21, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

mikehquan19 commented Apr 26, 2025

Uh oh!

TyHil commented Apr 26, 2025

Uh oh!

Uh oh!