Saturday, October 13, 2012

Twitter Archiving Google Spreadsheet TAGS v3

A tool for Twitter community visualisation, the Twitter Archiving Google Spreadsheet (TAGS), this new version has some coding improvements and new features including a dashboard summary and advanced tools for getting user profile information and  friend/follower relationships for social network analysis.

You can get a copy by selecting the link below:

Thursday, October 11, 2012

Learn to Read a Scientific Report

Causation vs. correlation
How do you know if a study’s results answer the question it set out to ask? Sometimes an outcome is just a coincidence—there’s a correlation but no causation. Meta-analyses pool the results of smaller studies and filter signal from that kind of noise. 

True size of the effect
Watch out for weasely language—a “threefold increase” might only be a shift from 1 percent to 3 percent. One recent paper reported that women’s mortality risk rose 133 percent. That sounds scary, but the elevated mortality rate was still just 1.9 percent. 

Statistical power
Look at two key factors, the n and the p. The n is the number of subjects used in the study. Multifaceted experiments typically have fewer subjects than simple surveys. Genetics studies need a big n. The p value lets you know whether the result is “statistically significant”—it’s the probability of something occurring by chance alone. You want to see a p of less than 0.05. (Results can be statistically significant and still only show correlation, or have confounding factors.) 

Conflicts of interest
Most journals now note this as a matter of policy. Was the company making the drug or product associated with the laboratory that did the study? Are any of the authors trying to sell a product? For example, the authors of a study exploring the effectiveness of “brain training” techniques on cognitive enhancement worked for the company that developed (and sold) those techniques. They disclosed this, but that’s still a red flag.

Wednesday, October 10, 2012

30 big data project takeaways

  1. Where do you start a big data project? Skunk works projects were a popular route and then those groups evolved to become dozens of employees and petabytes of data. Other options included the underserved business unit. Some companies had business leaders as sponsors.
  2. Leaders will have to take a few chances on big data projects. Translation: Trust your people, spend some money and take the leap.
  3. Use cases for big data abound. Among the possibilities:
    • Network optimization. 
    • Fraud detection.
    • Seeing what the customer experiences. 
    • Healthcare simulations.
    • Consumer focused marketing efforts require more social networking analysis and predictive capabilities. Consumer data is inherently unstructured. 
    • Travel and expense management to make intelligent decisions about costs. For instance, a company could notice it is sending too many people to one conference with aggregated data across 200,000 employees.
    • Marketing support and tracking of attrition rates in a subscriber-based business.
    • Closer ties between partners and suppliers via collaborative data and insight sharing. 
    • Christine Twiford, Manager, Network Technology Solutions at T-Mobile, said analytics gave the wireless provider confidence that it could offer an unlimited data plan without crushing the network.
  4. Analytics and business intelligence are bridging into big data applications. Historical data from years back has been usable, said Michael Cavaretta, Technical Leader, Predictive Analytics & Data Mining at Ford. In the future, Cavaretta said Ford will focus on data from the vehicle, but the real win may be the stream of information through the manufacturing process.
  5. The big data Petri dish will be the healthcare industry. "There's a lot of incentive out there to use big data to improve healthcare," said Katrina Montinola, Vice President of Engineering at Archimedes.
  6. Facebook is another big data Petri dish. Facebook could use big data techniques to make more money---while treading carefully on privacy. Conversely, Facebook is a huge data set by definition. After all, one billion users are sharing gobs of data. Facebook data could "provide an X-ray view" of what's going on in a customer's head. Companies could optimize that data to improve experience. Montinola said that Facebook would provide an ideal population for clinical trials. Skytland said Facebook could be "an amazing platform for collective action."
  7. "Big data is the oil of the information age," said Nicholas Skytland, Program Manager, Open Government Initiative.
  8. Shared analytics services are commonly used as a way to harness big data and blend in predictive techniques.
  9. Storage will be an ongoing big data issue because data scientists are pack rats---even hoarders---but there's a budget limit. T-Mobile can only keep 10 days of its clickstream data, said Twiford, who noted the company is trying to process more information in flight. Storage limitations will result in sampling.
  10. As for data sampling, data scientists will ultimately make the call on what information is hoarded and what's sampled.
  11. Data scientists will be in high demand and serve as investigators that test hypotheses. Data scientists will be paired with business domain experts. What's unclear is how many of these data wonks you need. In many respects, we'll all be data scientists to some degree---or at least data literate. Twiford said there's a talent challenge. There's also a challenge in recruiting big data talent and companies should look beyond Silicon Valley.
  12. Big data talent is tough to find. One company appointed internal people with business knowledge and supplement with a partner who had statistic and analytics wonks available (consultants). The long-term talent strategy for this company is to recruit heavily from universities to build an analytic employee pool. Talent has to be able to use data.
  13. Visualization tools and crowdsourcing may alleviate the big data talent crunch, said Skytland. Perhaps "citizen scientists" will bridge the gap, said Skytland. Visualization tools can bring big data to the masses.
  14. Universities and retraining will also bridge the big data talent gap.
  15. Too much time is being spent preparing big data and not enough actually analyzing it. Discovery and decision-making is being short-changed for preparation. Data preparation should be automated.
  16. When pitching big data to business leaders you need to start with this question: What business questions need to be answered?
  17. Most corporate big data projects are in their infancy. As a result, many are looking to combine data warehouse information with other data to be prescriptive. One company was looking to build a data warehouse on steroids.
  18. Partner with companies that can provide visualization tools via APIs. Of course, you have to liberate your data and open it up first, said Skytland.
  19. NASA is planning missions that will collect 24 terabytes of data a day. "We want to make sense of that data and actually navigate it," Skytland.
  20. There are thousands of silos in corporate America and sharing data is the biggest challenges. Big data could be a way to bridge those corporate silos.
  21. Big data applications are rolling first at business to consumer questions because they tie together experience, sales and analytics. Social media and multiple channels also mean that companies need to look for patterns in streaming data, said James Kobielus, IBM's big data evangelist.
  22. Hadoop clusters are surfacing everywhere in corporate America. If 2012 was the year of enterprise Hadoop pilots, 2013 will a ramp of usage.
  23. NASA initially created its own big data systems, but is using more commercial applications ranging from Amazon Web Services and a cloud infrastructure.
  24. Big data isn't new, but now has reached critical mass as people digitize their lives. "People are walking sensors," said Skytland.
  25. Social media is hyped in big data applications, but the diary of consumers' lives is great market intelligence. Chief marketing officers are pushing social media and big data projects. Cavaretta said Ford is using social data because it goes beyond what consumers provide in surveys and "represents what they are thinking."
  26. IT practitioners said that they wanted the largest data sets possible. The idea is that companies wouldn't have to rely on samples. However, there's a business challenge in determining what information is worth keeping and what should head to the archive or tossed.
  27. Making archived data usable for big data projects is going to be a running challenge.
  28. Governments and the ability to provide datasets can create entire industries. Under this theory, governments will essentially be data providers as one of its primary functions.
  29. Twiford said that T-Mobile is using big data techniques to learn more about the preferences of no-contract customers, which don't offer as much profile information as contract ones.
  30. Data analytics as a service and data visualization as a service will become commonplace. Third party vendors will move toward big data as a service to make it consumable for the masses. Tech vendors to go this route are likely the big market share leaders today (IBM, SAP, Oracle,

Sunday, October 7, 2012

Solomon Redivivus, 1886 by Constance Naden

WHAT am I? Ah, you know it,
I am the modern Sage,
Seer, savant, merchant, poet—
I am, in brief, the Age.

Look not upon my glory
Of gold and sandal‐wood,
But sit and hear a story
From Darwin and from Buddh.

Count not my Indian treasures,
All wrought in curious shapes,
My labours and my pleasures,
My peacocks and my apes;

For when you ask me riddles,
And when I answer each,
Until my fifes and fiddles
Burst in and drown our speech,

Oh then your soul astonished
Must surely faint and fail,
Unless, by me admonished,
You hear our wondrous tale.

We were a soft Amœba
In ages past and gone,
Ere you were Queen Of Sheba,
And I King Solomon.

Unorganed, undivided,
We lived in happy sloth,
And all that you did I did,
One dinner nourished both:

Till you incurred the odium
Of fission and divorce—
A severed pseudopodium
You strayed your lonely course.

When next we met together
Our cycles to fulfil,
Each was a bag of leather,
With stomach and with gill.

But our Ascidian morals
Recalled that old mischance,
And we avoided quarrels
By separate maintenance.

Long ages passed—our wishes
Were fetterless and free,
For we were jolly fishes,
A‐swimming in the sea.

We roamed by groves of coral,
We watched the youngsters play—
The memory and the moral
Had vanished quite away.

Next, each became a reptile,
With fangs to sting and slay;
No wiser ever crept, I’ll
Assert, deny who may.

But now, disdaining trammels
Of scale and limbless coil,
Through every grade of mammals
We passed with upward toil.

Till, anthropoid and wary
Appeared the parent ape,
And soon we grew less hairy,
And soon began to drape.

So, from that soft Amœba,
In ages past and gone,
You’ve grown the Queen of Sheba,
And I King Solomon.