Our client is a Berlin-based, remote-first scale-up providing cutting‑edge market intelligence and software solutions to the automotive industry. As the company enters an exciting new phase of growth, they are looking for an experienced Python Scraping Developer to strengthen their international, high‑impact team.
If you thrive on tackling complex data extraction challenges, building highly scalable web crawlers, and ensuring large-scale scraping systems run flawlessly in production, this role is for you. You will be responsible for the entire lifecycle of our high-volume scraping pipelines, guaranteeing the data we collect is accurate, consistent, and delivered with speed.
Responsibilities
Design & Development: Develop, test, and deploy robust web scraping scripts and crawlers using advanced Python tools (Playwright, Selenium, Requests, BeautifulSoup, etc.).
Scalability: Architect and maintain asynchronous scraping systems capable of massive, large-scale data extraction.
Resilience: Implement, monitor, and optimize sophisticated anti-blocking strategies and proxy rotation to ensure high reliability and uptime.
Integration: Manage and automate data ingestion pipelines and seamless integrations with external REST APIs.
Operational Excellence: Debug, monitor, and continuously improve scraper performance, reliability, and data quality.
Collaboration: Partner with other engineers to enhance our core scraping infrastructure, tooling, logging, and monitoring systems.
DevOps Support: Assist with DevOps tasks, including Docker, CI/CD, and managing Linux environments.
Requirements
Core Experience: Proven, hands-on professional experience in high-volume web scraping and data extraction using Python.
Technical Depth: Solid understanding of HTML parsing, browser automation techniques, and asynchronous programming.
Frameworks: Proficiency with leading web scraping frameworks (e.g., Playwright, Scrapy, or Selenium).
Web Knowledge: Strong knowledge of REST APIs, HTTP protocols, and effective proxy management.
Database Skills: Familiarity with both SQL and NoSQL databases for efficient data storage and processing.
Infrastructure: Experience with Docker, Linux environments, and version control (Git).
Communication: Fluent in English (written and spoken).
Mindset: Self-driven, detail-oriented, and capable of taking full ownership of significant projects.
Nice to Haves (Bonus Points)
Experience with advanced async libraries (e.g., asyncio)
Understanding of data quality validation and pipeline monitoring tools.
What they offer
Impact & Ownership: A high degree of freedom and the opportunity to have a meaningful, measurable impact on a growing scale-up business.
Flexibility: A high degree of flexibility – our client is a remote-first company and actively support remote work.
Growth: A competitive compensation package and dedicated support for your personal & professional development (ongoing training & coaching).
Team & Atmosphere: A great work atmosphere within a small, talented, and international team.
Office (Optional): A modern office located on the campus of Wildau Tech University, easily accessible by public transport (just outside Berlin).