Skip to content
Tim Santeford
  • How To Guides
  • Comparisons
  • Tools
  • About Me
  • Contact

PySpark

How To Guides

Log Record Counts in PySpark (With a Timer)

March 29, 2025 by Tim Santeford
round orange light effects

When I’m working with large datasets in PySpark, I often need to know how many records are flowing through my transformations. It’s a simple thing, but being able to log that information at the right time can help me catch issues early—like unexpected filters wiping out rows or joins ballooning in size. What makes it

Continue Reading →
Error Fixes

Fixing PySpark Import Errors When Using Custom Modules

March 17, 2025 by Tim Santeford

Encountering a ModuleNotFoundError while running a PySpark job can be frustrating, especially when the module exists and works perfectly outside of Spark. I recently ran into this exact issue while working with a Spark job that imported a function from a local utils module. Everything seemed fine until I tried to use a PySpark UDF,

Continue Reading →
Photo of Tim
LinkedIn
Stackoverflow
X

I'm Tim, a software engineer with two decades of experience, specializing in full-stack development, generative AI integration, interface design, and databases, leveraging technology to empower businesses. More...

Recent Posts

  • Solving the “git-lfs Not Found” Error: How I Fixed Git LFS Issues in My Project
  • Log Record Counts in PySpark (With a Timer)
  • Balancing Git Commit Quantity and Quality
  • Day One: Starting Strong as a New Software Engineer
  • Fixing PySpark Import Errors When Using Custom Modules

Categories

  • Comparisons (6)
  • Error Fixes (16)
  • How To Guides (129)
  • Uncategorized (1)

Most Viewed Posts

  • How to use Biome with Next.js for Linting and Formatting
  • How to Fix “Microsoft.CodeAnalysis.LanguageServer client: couldn’t create connection to server” Error in Visual Studio Code on macOS
  • How to Force a Complete Rebuild in Docker Compose, Including Anonymous Volumes
  • How to Fix Puppeteer Connection Error: “ProtocolError: Network.enable Timed Out” in Docker
  • Setting Up Vitest to Support TypeScript Path Aliases
  • Creating a Git-Like Diff Viewer in Python Using Difflib

  • Privacy Policy
© 2025 Tim Santeford • Built with GeneratePress