Skip to content

Fix violinplot crash on empty datasets (#31700)#31707

Open
rahulrathnavel wants to merge 8 commits into
matplotlib:mainfrom
rahulrathnavel:fix-violinplot-empty
Open

Fix violinplot crash on empty datasets (#31700)#31707
rahulrathnavel wants to merge 8 commits into
matplotlib:mainfrom
rahulrathnavel:fix-violinplot-empty

Conversation

@rahulrathnavel
Copy link
Copy Markdown

PR summary

closes #31700

This PR fixes a bug where passing an empty dataset to violinplot causes it to crash (ValueError: zero-size array to reduction operation minimum), whereas boxplot handles the exact same scenario gracefully by simply drawing nothing.

Reasoning for this implementation:
I updated cbook.violin_stats to check if the input dataset is empty. If it is, it bypasses the min/max/KDE math operations and returns an empty stats dictionary for that specific dataset. I also added a safeguard in axes.violin to prevent width-scaling calculations on empty density arrays.

This allows violinplot to safely skip rendering violins for empty datasets, perfectly mirroring the resilient behavior of boxplot. I have also included a regression test to ensure this remains fixed.

AI Disclosure

I used an AI assistant strictly to help navigate the codebase, locate the specific statistics functions in cbook.py and _axes.py, and draft the boilerplate for the pytest. The core logic was manually reviewed, applied, and tested locally to ensure complete compliance with Matplotlib's standards.

PR checklist

Comment thread lib/matplotlib/cbook.py Outdated
stats['min'] = np.nan
stats['max'] = np.nan
stats['quantiles'] = np.array([])
vpstats.append(stats)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread lib/matplotlib/axes/_axes.py Outdated
Comment on lines +9331 to +9332
vals = np.array(stats['vals'])
vals = 0.5 * width * vals / vals.max()
if len(vals) != 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vals = np.array(stats['vals'])
vals = 0.5 * width * vals / vals.max()
if len(vals) != 0:
if len(vals)>0:

while you are correct b/c len(vals) can't be negative, I think the stricter constraint is more expressive.

@rahulrathnavel
Copy link
Copy Markdown
Author

Hi @story645! The GitHub UI was throwing an error when I tried to accept the commit suggestion directly, so I pulled the branch and applied both of your changes manually locally!(since i don't why that apply suggestion button not worked for me)

_axes.py now uses the stricter > 0 constraint, and cbook.py has been updated to use the exact same 'append up here and mutate below' pattern as boxplot_stats. Let me know if everything looks good to go now!

Comment thread lib/matplotlib/tests/test_axes.py Outdated
rahulrathnavel and others added 2 commits May 20, 2026 10:55
Co-authored-by: Tim Hoffmann <2836374+timhoffm@users.noreply.github.com>
@rahulrathnavel
Copy link
Copy Markdown
Author

Hi @timhoffm, great one with the [np.nan, np.nan] edge case!

I've updated cbook.py so the NaN and inf stripping logic happens before the len(x) == 0 bailout check. The empty dataset dictionary now safely populates and avoids the crash even if the array initially contained only NaNs.

Also, just a heads-up: it looks like the AppVeyor Windows check failed right at the end of its run due to a Windows temp file PermissionError during teardown, but the actual pytest suite passed perfectly. Let me know if everything else looks good to go! and what to do next!

Comment thread lib/matplotlib/cbook.py Outdated
Comment on lines +1593 to +1595
# note tricksiness, append up here and then mutate below
vpstats.append(stats)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unconventional and makes the code harder to reason about. Instead, put the calculation into an else block:

if len(x) == 0:
     #empty stats
else:
     # calculate stats

vpstats.append(stats)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way boxplot does it is puts a continue at the end of the empty case. Might make sense here too?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @story645! I actually just refactored this loop into the strict if/else block that @timhoffm suggested above, since it lets us avoid the early append and the continue statement entirely.
The code is pushed and the linters are perfectly green! Let me know if you are both happy with this if/else structure, or if you'd prefer I switch it to the continue pattern!

Copy link
Copy Markdown
Member

@story645 story645 May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the continue pattern is better b/c then you don't have a giant indent block for the else that you need to keep track of. That's presumably why it's used in boxplots

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this differently: The "early return" block is almost as long as the regular block, because the majority of work is identical: configuring stats values and appending to vpstats. IMHO it's beneficial for readablility to reflected this parallelism in an if / else block with equal indentation for both cases.

On a more general note, the code is a bit fragmented and cluttered with extra variables. Directly appending a dict literal would be much cleaner:

for (x, quantile) in zip(X, quantiles):
    x = np.asarray(x)
    x = x[~(np.isnan(x) | np.isinf(x))]

    if len(x) == 0:
        vpstats.append({
            'vals': np.array([]),
            'coords': np.array([]),
            'mean': np.nan,
            'median': np.nan,
            'min': np.nan,
            'max': np.nan,
            'quantiles': np.array([]),
        })
    else:
        min_val = np.min(x)
        max_val = np.max(x)
        coords = np.linspace(min_val, max_val, points)

        vpstats.append({
            'vals': method(x, coords),
            'coords': coords,
            'mean': np.mean(x),
            'median': np.median(x),
            'min': min_val,
            'max': max_val,
            'quantiles': np.atleast_1d(np.percentile(x, 100 * quantile))
        })

But I'm not going to fight over this.

Comment thread lib/matplotlib/cbook.py Outdated
Comment on lines +1605 to +1606
vpstats.append(stats)
continue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vpstats.append(stats)
continue
continue

boxplot appends before the mutation. Would this work here?
https://github.com/rahulrathnavel/matplotlib/blob/0644d90addc002f66aa210178c570a17883ae7ed/lib/matplotlib/cbook.py#L1282-L1285

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @story645, appending before the mutation is actually exactly what we had in the earlier commit! However, @timhoffm mentioned above that the 'append early and mutate' pattern is unconventional and makes the code harder to reason about, which is why we moved away from it to avoid the else block entirely.

I am completely happy to change it back to the early append if you both agree that matching the boxplot structure is the top priority here! Let me know what you two decide is the best standard going forward.

(Also, as a quick heads up: the two failing Azure Pipelines Windows checks look like infrastructure flakes. They both failed with a subprocess.TimeoutExpired after some time during test_backend_getattr for backend_svg and backend_template, which seems completely unrelated to cbook.py!)-am i right !?

Comment thread lib/matplotlib/cbook.py
max_val = np.max(x)
quantile_val = np.percentile(x, 100 * q)
x = np.asarray(x)
x = x[~(np.isnan(x) | np.isinf(x))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented.

Comment thread lib/matplotlib/cbook.py Outdated
Comment on lines +1593 to +1595
# note tricksiness, append up here and then mutate below
vpstats.append(stats)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this differently: The "early return" block is almost as long as the regular block, because the majority of work is identical: configuring stats values and appending to vpstats. IMHO it's beneficial for readablility to reflected this parallelism in an if / else block with equal indentation for both cases.

On a more general note, the code is a bit fragmented and cluttered with extra variables. Directly appending a dict literal would be much cleaner:

for (x, quantile) in zip(X, quantiles):
    x = np.asarray(x)
    x = x[~(np.isnan(x) | np.isinf(x))]

    if len(x) == 0:
        vpstats.append({
            'vals': np.array([]),
            'coords': np.array([]),
            'mean': np.nan,
            'median': np.nan,
            'min': np.nan,
            'max': np.nan,
            'quantiles': np.array([]),
        })
    else:
        min_val = np.min(x)
        max_val = np.max(x)
        coords = np.linspace(min_val, max_val, points)

        vpstats.append({
            'vals': method(x, coords),
            'coords': coords,
            'mean': np.mean(x),
            'median': np.median(x),
            'min': min_val,
            'max': max_val,
            'quantiles': np.atleast_1d(np.percentile(x, 100 * quantile))
        })

But I'm not going to fight over this.

@rahulrathnavel
Copy link
Copy Markdown
Author

Hi @timhoffm and @story645, thank you both so much for talking through the design and for the phenomenal mentorship!

I have implemented the dict literal snippet exactly as requested, added the documentation comment for the NaN-stripping logic, and the CI checks are now 100% green! 😁

Since this is one of my very first open-source contributions, I really appreciate your patience and guidance in helping me get the code structure and standards just right. I am super excited and looking forward to seeing this merged! Let me know if you need absolutely anything else from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: violinplot crashes on empty datasets

4 participants