Commit Graph

6 Commits

Author SHA1 Message Date
Muyue
6de44556f4 Feature: Add club results extractor and course details scraper
- Add get_club_results.py: Extract all results for a club from results CSV
  - Fuzzy search for club names
  - List clubs functionality
  - Export results to CSV
- Add scrape_course_details.py: Scrape detailed info from course pages
  - Extract competition ID (frmcompetition)
  - Extract event details (distance, category, sex)
  - Extract course website link
  - Extract location and organizer info
- Update README.md with new scripts and usage examples
- Update version to v1.3.0
2026-01-03 11:03:35 +01:00
Muyue
55e6fa5292 Feature: Add resume script for FFA scraping
- New resume_scraping.py: Resume scraping from specific date
  * Designed to continue after crashes or interruptions
  * Starts from 2024-04-08 (after original script crash)
  * Continues until 2026-08-01
  * Appends to existing CSV files (no data loss)

- Handles 'invalid session id' errors
- Preserves existing data in courses_daily.csv and results_daily.csv
- Allows seamless recovery from Selenium/Chrome crashes

- Documentation in docs/SCRAPER_REPRISE.md
2026-01-02 15:50:24 +01:00
Muyue
0bd65b1d3f Feature: Add new daily scraper approach for FFA data
- New scraper_jour_par_jour.py: Day-by-day scraping approach
  * Fixes 403/404 errors from previous method
  * Uses frmsaisonffa= (empty) parameter to avoid season filtering
  * Scrapes courses and results for each day from 01/01/2024 to 01/08/2026
  * Progressive CSV saving with 'jour_recupere' column for traceability

- New scraper_jour_par_jour_cli.py: CLI version with customizable dates
  * --start-date: Custom start date (default: 2024-01-01)
  * --end-date: Custom end date (default: 2026-08-01)
  * --no-results: Skip result fetching for faster scraping
  * --output-dir: Custom output directory

- Documentation in docs/NOUVEAU_SCRAPER.md
  * Explains problems with old approach
  * Details new day-by-day methodology
  * Usage instructions and examples

- Cleaned up: Removed temporary test scripts and debug files
2026-01-02 11:54:56 +01:00
Muyue
f6c8e889d5 Feature: Complete FFA scraping system with results extraction
🎯 Major achievements:
- Scraped 133,358 courses from 2010-2026 (17 years)
- Extracted 1,753,172 athlete results
- Fixed season calculation bug for December months
- Implemented ultra-fast scraping without Selenium (100x faster)

📊 Data coverage:
- Temporal: 2010-2026 (complete)
- Monthly: All 12 months covered
- Geographic: 20,444 unique locations
- Results: 190.9 results per course average

🚀 Technical improvements:
- Season calculation corrected for FFA calendar system
- Sequential scraping for stability (no driver conflicts)
- Complete results extraction with all athlete data
- Club search functionality (found Haute Saintonge Athlétisme)

📁 New scripts:
- scrape_fast.py: Ultra-fast period scraping (requests + bs4)
- extract_results_complete.py: Complete results extraction
- combine_all_periods.py: Data consolidation tool

⏱️ Performance:
- Scraping: 16.1 minutes for 1,241 periods
- Extraction: 3 hours for 9,184 courses with results
- Total: 1,886,530 records extracted
2026-01-02 01:16:06 +01:00
Muyue
adb49d5484 Corriger le script scrape_all_periods et documenter son utilisation
- Remplacer data_2010_2026 par data dans scrape_all_periods.py (2 occurrences)
- Ajouter la section 3.5 dans le README pour expliquer le scraping complet
- Documenter le fonctionnement du script par périodes de 15 jours (2010-2026)
- Expliquer la structure des fichiers générés et le processus automatique
- Tester avec succès le scraping d'une période (134 courses récupérées)

Le script scrape_all_periods.py permet maintenant:
- Scraper toutes les courses de 2010 à 2026 par lots de 15 jours
- Utiliser le répertoire data/ correctement
- Fusionner automatiquement tous les CSV dans data/courses/courses_list.csv
- Exécuter les scripts de post-traitement automatiquement

💘 Generated with Crush

Assisted-by: GLM-4.7 via Crush <crush@charm.land>
2026-01-01 18:10:17 +01:00
Muyue
a5406a4e89 Initial commit: Reorganiser le projet FFA Calendar Scraper
- Créer une arborescence propre (src/, scripts/, config/, data/, docs/, tests/)
- Déplacer les modules Python dans src/
- Déplacer les scripts autonomes dans scripts/
- Nettoyer les fichiers temporaires et __pycache__
- Mettre à jour le README.md avec documentation complète
- Mettre à jour les imports dans les scripts pour la nouvelle structure
- Configurer le .gitignore pour ignorer les données et logs
- Organiser les données dans data/ (courses, resultats, clubs, exports)

Structure du projet:
- src/: Modules principaux (ffa_scraper, ffa_analyzer)
- scripts/: Scripts CLI et utilitaires
- config/: Configuration (config.env)
- data/: Données générées
- docs/: Documentation
- tests/: Tests unitaires

💘 Generated with Crush

Assisted-by: GLM-4.7 via Crush <crush@charm.land>
2026-01-01 18:05:14 +01:00