Ask HN: Are my HPC professors right? Is Python worthless compared to C?

14 points by megaloblasto 3 days ago

I'm a PhD student implementing a finite element code. It simulates electromagnet waves passing through heterogeneous material. This code has to run in parallel, and run fast. I've been using old C libraries like PETSc to do this, and honestly, I do not enjoy working with C at all. Its esoteric and difficult to understand, and just overall feels like I'm using a tool from the 70s.

I want to rewrite my simulation in Python. Every single HPC professor I had told me that Python is worthless for HPC and I should use C or C++ (they generally think Rust is interesting but don't recommend it).

I don't understand this way of thinking. My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C. I can use CuPy to run my code on a GPU, or mpi4py to run it in parallel with MPI. If I get my code working and prove that what I want to do is possible, but still need more performance, then I can write it in C as a last step.

What do you think? Should a young PhD student in HPC really be investing all their time in C and not consider Python as a reasonable solution?

bhaney 3 days ago

> This code has to run in parallel, and run fast

Then you're certainly not going to get away with writing it all in Python, but it's a very common paradigm to write hotter parts of the code in faster languages and then glue everything together in Python. I don't see why that wouldn't work here.

> My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C

That's a very reasonable and common approach if you aren't already confident in which parts will need the extra performance ahead of time.

> Should a young PhD student in HPC really be investing all their time in C and not consider Python as a reasonable solution?

You should absolutely be using both together, each to their respective strengths. The only thing unreasonable about any of this is the idea of pitting the languages against each other and acting like one needs to win.

warner25 2 days ago

I'm an old PhD student. I've seen cases where an easy / naive solution written in Python can be orders-of-magnitude slower than a solution written in C, but I suspect that you're right: that a thoughtful / clever use of Python should be perfectly fine. I've also seen that professors don't know everything, and what they do know can be dated.

More importantly, though, I think a young PhD student should not pick a fight with his advisor and committee members or try to prove them wrong. Generally, do what they suggest and give them credit for it, or at least thank them enthusiastically for the suggestions and don't make a big deal about not following them. You're at their mercy for getting the PhD, and it's subjective, and their opinion of you probably matters at least as much as their opinion of your work. This is one the things that I learned from ~15 years of professional work under many different bosses before starting my PhD program, and something that I think many young students still need to learn.

  • megaloblasto a day ago

    I appreciate your advice. I assure you that I have no intention of picking a fight with any of them, or necessary to prove them wrong. It is almost in my nature to always question the way people do things, and to never believe something just because it is what every does (even when the people doing it are very smart). I think in general people appreciate this attitude, and even the occasional stubbornness, as long I am respectful in my pursuit.

GianFabien 3 days ago

I did my PhD after 20+ years in industry, lots of it programming in C on AIX/HPUX/Solaris systems. I ended up learning Python to complete my research work.

Granted compared with Python, C is verbose and the edit-compile-debug-run is a drag. However, C as a language is not too bad. It is the libraries and APIs that slow me down. Often times the abstractions are too leaky or a poor fit for what needs to be done.

What works for me is to test and refine algorithms in Python. When it works well, then I use the Python code as pseudo-code and translate to even more optimized C code. It helps to modularize your Python code, so that you only need to port the performance critical portions to C and the rest can remain in Python.

Of course, you still need to learn and gain experience with C. Personally I wouldn't put much faith in Python to C transpilers. For optimal performance, you really need to understand your algorithms and data structuring. These days understanding how caching and locality of code and data impacts performance is crucial to writing performant code.

BTW have you considered using CUDA, etc for your finite element code? GPGPUs are ideal for that sort of computation. Lots of potential for parallelization.

  • megaloblasto 3 days ago

    This is a great response. I really appreciate it, thank you. I think this will be my approach going forward. Its so easy to translate what I'm thinking into python. Once I profile it and figure out what needs to be optimized, I'll write it in C. I realize that gaining experience with C can be helpful, and I am improving. I just really don't enjoy coding in C.

    • badpun 2 days ago

      The problem with Python is that it can be so slow that you might be waiting for your algorithms to converge for a long time during each run. However, I agree that it is in general useful for sketching.

tripplyons 3 days ago

If you're studying HPC, I would say that it's probably worth learning how to do it the hard way, especially if you will have other projects that require doing so.

However, if you want to use Python, I would consider JAX. It has a nice set of operations based on numpy and scipy and can compile to many backends (GPU, TPU, CPU, etc.) using XLA. The compiler is great at finding ways to optimize routines that you might not think of. Some of the manual parallelism functionality is still considered experimental, but I haven't seen that cause any issues or prevent functionality.

  • megaloblasto 24 minutes ago

    This is great. I hadn't heard of JAX but it is the sort of project that I was envisioning for solving this problem. Thanks!

codingdave 2 days ago

Few things are black and white to the point they should be saying a tool is worthless. Nor should we be declaring a binary right/wrong on their take on it. There is nuance and grey areas to all things, so the question is less "Are they right?", and more "What drives them to say such a thing?"

gregjor 3 days ago

Other commenters gave excellent and actionable answers to your question. I want to quibble about your reaction to C.

> I do not enjoy working with C at all. Its esoteric and difficult to understand, and just overall feels like I'm using a tool from the 70s.

Esoteric means "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest." That does not describe C, a language widely understood and used by a large number of programmers, across application domains and programming interests.

Difficult to understand describes a reaction you have to learning C, not a property of the language. Again a very large number of programmers understand and use C, have for decades, and a huge amount of C code gets written and maintained constantly. The C language includes very few keywords and a simple syntax, and a small standard library compared to Python. People new to C usually trip over memory management, pointers, and the overall philosophy behind C, not learning the language itself.

C does date back to the late 1970s, but so does most hardware and software technology we use today. Newer does not equal better, and C has remained relevant and popular despite its age because it works so well. Toyota introduced the Corolla in the mid-1960s and it remains relevant and widely-used today, not to mention influential in the automobile industry. C occupies a similar position, a language that works so well it has staying power and has undergone relatively minor updates over time, unless you count derivative languages that expand on and perhaps improve on C -- C++, Go, Rust, Zig, many others.

Good luck with your project.

  • megaloblasto a day ago

    Thanks for your reply.

    "likely to be understood by only a small number of people with a specialized knowledge or interest" is the meaning I was trying to convey here. I think C is more difficult that it should be, but it persists for two main reasons. First, many domain experts seem to regard it in an overly positive light, and secondly, there is a huge amount of fundamental software written in C (as you pointed out) which has inertia sustaining the languages importance.

    Difficult to understand describes a reaction you have to learning C, not a property of the language

    I think this is true to an extent. Take the following example. I have a C API that requires me to write a callback function, and pass it as an argument to a function called boundaryConditions. boundaryCondtions can actually take a list of functions, but by far the most common use is to pass it a single function. As you probably know, to pass a function as an parameter in C you have to use function pointers. That means that if I want to pass a callback function to boundaryConditions, I need to create a function, then create a pointer to that function, then create an array of function pointers that contains my function pointer, and pass that as my argument. I don't see a world where this is the optimal way of doing things (If I'm wrong, please enlighten me), and I tend to find strange, quirky behavior like this all the time when writing C.

    This phenomenon of a bad language sticking around is not unheard of in programing (JavaScript for example). I think the Corolla is still around because it is a great car, and people want it. I think C is still around because of inertia and overly positive views of the tool.

    • gregjor 15 hours ago

      > I think C is more difficult that it should be

      I think if you view C as a cross-platform macro assembler it makes more sense. You can read the origin story of the C language. The main author or inventor of C, Dennis Ritchie, intended to make a language that targeted the capabilities of hardware available at the time (circa 1972). The goal was to write a reasonably high-level language that did not create significant performance issues compared to assembly language or other languages available at the time. If you go back to that era (my career started in the late '70s) every computer manufacturer had their own instruction set architecture (ISA) and their own proprietary languages and tools (frequently variants of FORTRAN, COBOL, etc.). To make a cross-platform operating system (what we now know as Unix) the team needed a cross-platform language that could express what the hardware of the day could do. C models how almost all digital computers work at a low level, but not as low as the architecture-dependent assembly language.

      The authors of C and Unix did optimize the language and OS and associated tools, but they optimized them for the available hardware, out of necessity. They did not optimize for programmers who didn't know assembly language or struggled to understand pointers.

      C has remained relevant in part because of what you call inertia -- huge amounts of code already written. The language has evolved and changed over time, but not much, because the underlying principles of digital computers that C models have not changed much.

      > As you probably know, to pass a function as an parameter in C you have to use function pointers

      Yes. C does not have first-class functions. It doesn't have first-class strings or arrays either. C has primitive data types -- char, int, float, pointer -- that correspond directly to the kinds of things CPUs work with. Again the author of C did not set out to make a general-purpose language for application programmers like us. Languages with first-class functions already existed at the time (Lisp predates C by almost two decades). Functions don't exist at the assembly language level, so while C has functions it does not let you pass them around other than with pointers -- the same way it lets you pass "strings" around.

      Of course anyone used to using a language like Python or Javascript, which do not intend to model CPUs in a way that minimizes performance cost, will find C frustrating because it leaves it to the programmer to do things like get a pointer to a function and put it into an array. Many modern languages have syntactic sugar so we can pass functions around, which makes C look primitive and clunky by comparison.

      > I don't see a world where this is the optimal way of doing things

      I tried to describe that world. You can look at it a couple of ways. C comes close to optimally modeling ISAs, which rely heavily on indirection (pointers) and a small number of primitive data types. Other languages from the same era did that less optimally and we've mostly forgotten them. C also gives programmers just enough expressive power to write code that directly translates to assembly language but hides a large amount of CPU ISA-specific details.

      Compare how C passes an array of function pointers to trying to do the same thing from Python (to a C API). You probably have some helper functions that can handle the significant mismatch between Python data types and functions (in an interpreted language) to C functions, but if you had to roll that yourself it would look a lot more baroque and fragile than the C version.

      C abstracts CPU instruction set architectures just enough to make programming an order of magnitude easier without imposing a big penalty. Python, Javascript, and many other higher-level languages abstract and wrap C code to make programming easier as well, but impose significant performance penalties. Sometimes we don't care about or can live with those penalties, sometimes we can't, as in the problem you described in the original post.

      • megaloblasto 13 hours ago

        I tried to describe that world.

        I think you've done a very good job of explaining that world. Perhaps I was originally misdirecting my frustration at C, and it should have been at the fact that I am trying to solve such a computationally intensive problem that I cannot afford to incur the performance penalty caused by the abstractions in languages like Python.

        This really helped me understand the use of C, thank you very much.

        I wonder if one day soon computers will be powerful enough, and compilers optimized enough, so that even our biggest scientific simulations can be written in highly abstract programing languages. I have a feeling that we will still be wanting to harness every little bit of computational power we can for a long time to come, and low level languages like C and Rust will maintain their necessity.

        Thanks again.

        • gregjor 9 hours ago

          Glad to have a useful exchange on HN.

          The difference between C and Python comes down to layers of abstraction and conveniences for the programmer. Those abstractions and conveniences come at a cost measured in performance, another way to say demands on the hardware. Computer hardware along with compiler and related technology has come a long way since C got introduced, but of course C benefits from those improvements so it remains relatively fast compared to more abstracted languages such as Python. Writing C also demands more low-level knowledge from the programmer than writing in Python or Javascript, which limits C's accessibility to programmers who just have a job to do.

          Right now many people expect or hope that AI in the form of LLMs will let us generate efficient code from human-language descriptions, eliminating the need for expert C programmers to write and optimize code "by hand." I don't share those expectations, I see mostly hype and a lot of work trying to get AI to mimic a human programmer (e.g. writing Python code rather than giving optimal solutions to problems). My decades in the software business have perhaps made me jaded and cynical -- I have heard grandiose claims about revolutions in programming before. But you never know, maybe someone (or some AI) will figure out how to bridge the gap between requirements imperfectly expressed by people and what digital computers can efficiently do.

DamonHD 3 days ago

Python will be much slower than C, but if you follow the path that you suggest and use Python mainly as glue to stick C-based performance critical parts together you could be fine.

I used to edit an HPC trade rag, and I've written a lot of performance-critical code in C and C++, and even in Java eg for high-speed trading.

As a now-old fresh PhD student I think that your profs are probably wrong!

pyb 2 days ago

" My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C. "

Your thinking is correct though sometimes, even with the best intentions, people stop at the first step. Fixing code that already works is usually not a high priority task.

bjourne 2 days ago

You're in academia. The goal isn't to write fast code it is to create (and publish!) knowledge. If it is easier for you to create knowledge in Python than in C you should use that. In HPC exact performance on a specific device with a specific runtime is uninteresting. What is interesting is how well the solution scales, what its bottle-necks and constraints are.

  • megaloblasto a day ago

    My goal is not to remain in academia, it is to build medical devices.

fragmede 3 days ago

C++ seems like a good middle ground. Supports more modern features than pure C, but is compiled and is faster than Python. Also it sounds like your professors support that, vs not supporting Python.

sn9 2 days ago

Using Python for glue code and compiled native code (whether C or C++ or Rust or whatever) is a classic strategy.

Just profile your code with something like Scalene: https://github.com/plasma-umass/scalene

Alternatively, you can just write it in Julia.

thesuperbigfrog 3 days ago

There are already Python wrapper libraries for PETSc: https://petsc.org/release/petsc4py/

I am not a HPC user or developer, but I have written Python wrapper libraries for C libraries. It is fairly easy to do, but it looks like some of what you are looking for is already done.

  • PaulHoule 3 days ago

    I remember using a Lua front end to FORTRAN programs in the mid 1990s. Python is fine as a scripting language to script components written in a systems language but you don’t want to do billions of FLOPS with it.

  • megaloblasto 3 days ago

    Thank you for the reply. I do know about these wrappers, but the documentation is lacking, and there are a few things that I want to implement that are not possible in PETSc. I'm not sure I want to fork the repo and add the things that I need.

    • thesuperbigfrog 3 days ago

      As others have stated in the comments, scientific computing in pure Python is likely to be too slow and possibly difficult to deploy on HPC nodes.

      Most use of Python in scientific computing and research tends to be as a high-level "glue" or scripting language that calls extremely optimized C, C++, or FORTRAN libraries. Most of the libraries involved are decades old, very mature, and widely used.

      Depending on what you are trying to do, it might be possible to use CUDA or other GPU-based parallelization libraries.

      Rust-based libraries might exist, but are likely to be considerably younger, possibly buggy for some problem-set edge cases, and less widely used than their corresponding C, C++, or FORTRAN counterparts.

      Ambition and building new things to solve problems is always good, but your professors know what works. If you are building new things, I would make sure to compare your work with existing solutions to verify that the new stuff is behaving correctly.

      Best of luck in your endeavors!

softwaredoug 15 hours ago

Frankly you will learn more if you try it and fail (or maybe succeed!?) than listening to the advise of professors or random internet comments.

  • megaloblasto 13 hours ago

    I promise you that I have tried and failed more times than I can count. Sometimes getting the perspective of other people can be very helpful.