Darryl Issa.
- Came in late. Mostly answering questions about Medical care.
David Williams-IG at USPS
Risk Assessment in the Brokerage Industry
- Some good quotes about the necessary changes for government especially with respect to managing and using data, including:
- "Be everywhere businessman or be gone"
- "No army in the world is big enough to stop an idea whose time is come" - Victor Hugo
- Middleman higher cost than manufacturing
- Government drowning in data
- Future of governance
- must use data
- can't be bottlenecked
- need interdependencies
- Problems-data in silos, and we're buying data back from vendors
- Solution: essentially sharing data and databases.
- Not sure how this works for IC
Deloitte plenary
- Incredibly boring
- Bottom line is that they made changes to data storage, use, and applications that ultimately saved the Navy about 19 million dollars.
- Nice job Deloitte.
- Deloitte once again.
- Risk assessment in the brokerage industry.
- Julioi Girardi
- PhD in economics
- Foreign speaker
- Begins' 4500 registered broker-dealers firms.
- OCIE responsible for inspections & examinations.
- Limited resources for so many firms
- Broker dealer firms are classified into seven peer groups based on filing criteria and business practices.
- Precictive analytics are then ru nwithin each group.
- Firms within each group are ranked in some way.
- Methodology
- Gather data
- develop hypothesis and identify critera (predictors)
- Test hypotheses by building model
- Three areas of risk assessment"
- financial and operational
- workforce
- firm structure and supervision
- They simply sum the scores on four criterion (total score of 12)
- In future, they may not longer use linear model because one criterion could be a better predictor than others.
- Ultiamtely have a High, medium, and low risk designation.
- These labels depend on criteria (no hard cut off)
- Some proportion of firms they like to label as high, medium and low
- Also, they look for breaks in the data
- It's a continuous feedback model.
Text mining Case studies
- text book-practical text mining
- Dr. Andrew Fast
- ESPN coach success predictor person
- Big Data 3 vs, volume, variety, velocity
- Variety
- Big data system is a system that integrates informatin from varied sources for deepter and broader understanding (sue feldman, CEO of Synthexsis)
- Combine structured and unstructured text for more power
- Issues include complete foreign keys, keys across data entered manually.
- Need to improve support for second users of the data
- Goal is really to identify structured data from unstructured text.
- examples presented include getting SSNs, phone numbers, etc. from text in places where the postal worker failed to collect it.
- Lesson 1--expand your data by using extra data sets.
- Lesson 2-expand your query. (e.g. theft, stoeln, opened, lost, not delivered, missing)
- Gropu documents wtih similar content
- example given of group who needed to find a document, but couldn't really describe it.
- He says you can use entire docuemnt as the querry.
- strategy called cosine similarity.
- Lesson 3-multiply your efforts
- Case study" SSA Disability approval
- Pain-approval process is up to 2 years
- Goal-fast track easy eases
- challenge-free-text on disability application
- Results-20% of approvals possible immediately
- Highlight pattersn of language likely to indicate abuse
- uncover indicators mentioned in comments (financial stress)
- look at supervisor notes and ot her oversight information (persennel risk)
- Lesson 4: combine approaches
- Text mininig can be viewed from many perspectives
- no single view provides complete solution
- must consider entire beast to get best solution
- Blind men and elephant analogy
- Finding elephants
- Bigger data-which zipcodes have complained about cash4gold
- query expansion-ail theft complaints
- More like this-finding recipes for WMD
- Each text mining area provides a different trade-off between power and generality.
- Document classificaiton is most powerful
After Lunch Plenary-John Elder of Elder Research.
- General lessons we can learn from black box trading
- Investment modeling
- Started company based on success with slim advantage over other hedge funds. Had a lot of success and closed fund at peak of success because numbers said they would not longer have an edge.
- Sucess is possible.
- Huge reward-data plentiful, but noisy (bloomburg earnings)' market efficient' pockets of inefficiency' skill is almost indistinguishable from luck' system can change overnight
- Discipline of partially solving issues has improved much of our other work.
- Most failures as a company have been in stock market.
- "WE FOUND SOMETHING"
- New hdge fund investement system.
- Down to two parameters.
- Data challenges, leaks from the future-predicting interest rates'
- Have to hire someone to break your stuff.
- Data analysts are like artists, they love their models.
- But people just don't think of everything.
- Tought to build something idiotproff because idiots are so ingenius
- Look for things that work TOO WELL. Issues most likely exist.
- Model goal: get computer to feel like you do.
- Careful abou tmaximizing accuracy because all errors are not equal.
- Resampling to evaluate accuracy (e.g. cross-validation)
- Train V models on differeent data subsets.
- Test each on onseen data
- Use distribution of results to score model realisticity.
- What the world needs is a one armed statitician. On the one hand.... No other hand.
- What's the chance I could get a result like this by chance? That is the essential question for any statistical test
- 5 lessons learned-1, assess cost and potential rewards (small improvements may lead to large rewards, later technology may matter, custom error metrics may be worthe the trouble. 2. Must have access to domain knowledge' 3. Data is going to be flawed, but don't let it stop you. Don't wait for data warehouse; 4. work extrememly hard to break your model. Need outside help, resampling is essential, visualize failure--need to reward breaking; 5. Share the work and share the reward becasue that will grow the pie.
Plenary Panel led by Dean Silverman (IRS)
- Developing an analytics framework and measuring success
- Roles of data analysis, evangelist, storage, and something else...
- He's more on the data evangelist side.
Accenture
- Advanced analytics deliver insight for improved sales.
- What doing to put Postal Service into 21st century.
- 560000 employees
- over 200000 vehicles
- 36400 outlets, larget than mcdonalds, walmart, and starbucks, combined
- 584 million pieces of mail a day to over 150 million residencies, po boxes, and businesses.
- Sales responsible for 48 billion of 66 B total sales revenue for USPS.
- 700+ sales reps, whereas USPS has more than 4000.
- Problem-declining revenue with lower mail ivolume
- Limited ability to hire to boost sales
- Need to become more efficient.
- No single view of customer, no data driving decisions.
- Solutions-platform (bring data to one platform, single view of customer), process (build models), and third thing didn't get... sales?
- How put everything in one central location?
- Talk about predicitve analytics and sales effectiveness
- Salesman were using gut decisions
- Accenture lady, southern. Designed model, processed data, built model, implemented model, and assessed.
- logistic and linear regression worked best for th is project
- probably that sale would ocur (logistic)
- estimated revenue from a sale (linear)
- Total sales are up significantly
Hudson Hollister-Open Data Reforms
- Founder and Executive Director of Data Transparency Coalition.
- Washington policymakers are getting their act together.
- Want open data in structured formast for everybody.
- 7 buckets represent fed gov
- federal spending (inconsisten formats, lack of identifiers, complex reporting structure; data act will lead to transformation; leaving implacation for analytics) 5 people in OMB understand how MAX budget works. He hasn't met them. THERE IS NO DATA GOVERNANCE IN FEDERAL SPENDING. Tresury deparrmtne is asked by law now to provide identifiers and more structure. Senate sponsor, mark warner, GIPROMA, some act, says performance and spending can be done on a program by program basis.
- management (subject matter experts, but unstructured). Open Data policy by Obama commands all departments to create a data inventory. Default should be opoen data (defined by seven things). Roadblocks include the sME. Most stuff not goign to be recognized as important or necessary for this effort. DOCUMENTS ARE DATA.
- financial regulation (financial regulators do not coordinate. Collect overlapping information. FIT act requiers SEC to have same standards for finance regulation. OFR has authority to force all regulators to adopt standards in regulation.
- general regulation (same issues) Will our enemeies have the same access? Yes. Is an issue
- tax. Standardized formats for tax returns, making turbotax possible. KUDOS to IRS for doing that in the 90's. Only exception is nonprofits, and they are brought ini through XML, but they put it into tiff documents. Obama proposed changing this in 2012 budget. Unfotunately, no member of congress has stepped up to propose this. )
- legistlation and the code. Need to structure this so we can take advantage of searching, and analyzing. Boehner and Cantor say we have to look for XML.
- judicial. Diverse formats. Some briefs in wordperfect :).
- Need to replace pdfs with page breaks.... What does that mean???
- Imagine if we could combine all of these data...
- We can tie together everything. Benefits far outweight the disadvantages.
- Prospect of automating all reporting is huge benefit.
- Will eliminate so many compliance lawyers and paper and solve a lot of problems.
- DATA TRANSPARENCY COALITION
- OMB setting up its own analytics office.
- Much further along in other nations than it is in the US. UK is several years ahead.
- theODI.com--they certify datasets as open.h
- asdf
- asdf
- asdf
- asdf
- asdf
- asdf
No comments:
Post a Comment