Morrison on Metrics: Grouped thoughts on scattergrams

The finer points of plotting graphs and lines

When you create a scattergram, the most common additional step of analysis and visualization is to superimpose on it a trend-line. If you use the default trend-line function of Excel, the line will be straight (known as the least squares line), and it will generate an equation for the line. A straight line, however, may convey a distorted sense of the trend of the points on a scattergram that has some unusually high or low numbers or anomalies in the middle of fairly consistent numbers. True, you can selectively delete the outliers, but doing so leaves you vulnerable to charges of data manipulation.

Alternatively you would rather have the plot of points reveal its pattern with a line that more snugly fits the points. Called a smoothing function, various programs can create such a smooth line—one that may have bumps and squiggles in it. One advantage of using a smooth, rather than a more traditional linear fit, is that smoothed lines are local. The effects of some outlier points on a smooth fit affect only those parts that fit near those points with a linear method, whereas outliers distort the entire straight line. The slope—the equation generated—looks at all the data points as equally influential.

One particular method of smoothing a set of data points is known as “53h.” 53h is actually just a descriptive designation for many classes of smoothers. Let me explain this particular one. You take the medians of every five points, which in itself, if plotted, could yield a considerable smoothing effect. Next, you continue the process by plotting every three of those medians. The “h” in 53h is a specific linear combination of the three: the original data, the every-fifth medians and the medians of every third median thereafter. The smoothing vector that results can then be subtracted from the original data and the process repeated on the residuals. But that is getting beyond what I think I understand.

Statisticians and visual data analysts use variations of the 53h smoothing method. They might select every sixth data point, and then every fourth, but the outcome is similar. All the data you have available can be incorporated, but extreme or odd values will only disturb a small, local portion of your overall smooth curves.  A line makes it easy to extrapolate to some point, but that may be misleading.

You do need to have a fair amount of data for smoothing to work. Say you placed against one axis the number of lawyers in each law firm that you paid last year and on the other axis the amount you paid the firm. Many law departments would have more than 100 data points, and a smoothed function would effectively make visible and convey the whole data pattern. Try it yourself with the “scatter with smooth lines” function in the graphing section of Excel.

If you want a very sophisticated explanation and example of smoothing scatter-plot data, visit this website.

About the Author
Rees Morrison

Rees Morrison

Rees Morrison, Esq. is the founder of General Counsel Metrics, LLC. Based in Princeton, NJ, Rees has for the past 25 years consulted solely to law departments on a wide range of management challenges: operational reviews, cost control, re-engineering, structure and organization assessments, client satisfaction, technology, benchmarking. Rees has assisted more than 275 law departments on four continents. He also coordinates the largest law-department benchmarking database and analysis ever conducted with more than 1000 participants. 

Comments

InsideScoop Daily eNewsletter

InsideScoop delivers the latest-breaking news affecting in-house counsel. Get the latest business trends, current corporate litigation, labor developments, technology initiatives and more — FREE. Sign up now!

You have been subscribed! You will receive a confirmation email soon.

See the entire list of InsideCounsel eNewsletters.

Resource Library


7 Simple Strategies for Improving Legal Fee Budgeting Certainty

Understanding the legal fee budgeting paradigm and following seven simple strategies will help you control...

Complimentary White Paper: Best Practices for Meeting Critical eDiscovery Challenges

Packed with practical advice, this white paper discusses best practices for meeting eDiscovery challenges across...

Complimentary White Paper "Key Considerations for Collection Methodologies and Resources"

This white paper addresses the need for companies to reevaluate their current collection policies in...

Moving Matters In-House: How Technology Enables Legal In-Sourcing

Strategically shifting more matters to in-house counsel has proven to be an effective strategy to...

5 Ways to Promote Responsible Content Sharing

Find out five ways that organizations can promote responsible sharing of content among employees by...

Reducing the Costs of eDiscovery from Collection to Court!

Predictive coding is only one of many ways organizations can make eDiscovery faster, cheaper and...

Discovery Shifts to the Cloud

Adoption of Cloud computing continues to gain momentum. How can IT and Legal Teams avoid...

Lower Your Total Cost of Ownership

With the deployment of Proofpoint Enterprise Archive, organizations have realized significant cost savings in automating...

Health and Safety Risks of Counterfeits in the Global Supply...

This whitepaper underscores the prevalence of counterfeits within global supply chains across a number of...

Get the facts you need to Help Implement Sound Legal...

This whitepaper will examine the cases that are setting precedents. Download "Legal Hold and Self-Collection:...

View All »

Advertisement. Closing in 15 seconds.