class: center, middle, inverse, title-slide # Evaluating Content-Related Validity Evidence Using Text Modeling ### Daniel Anderson ### Brock Rowley ### Sondra Stegenga ### P. Shawn Irvin ### Joshua M. Rosenberg --- # Background ### Content-related validity evidence -- * One five major sources of validity evidence (as outlined by the [*Standards*](https://www.apa.org/science/programs/testing/standards)) -- * Does the content represented in the test represent the targeted content? + Are specific areas missing? + Are specific areas over-represented? -- * Operationally, evidence is often gathered through *alignment studies*. + Judgments made by panels of experts (educators). + Does the test items align with the content standards? --- # Study purpose Extend content-related validity evidence through the use of text mining -- * What thematic topics are represented in the content standards? -- * How do individual items map on to these topics (if at all)? -- * What is the overall coverage of the topics across test items? --- # Topic modeling * Corpus of words split into *documents* -- + We treat each content standard as a document -- * Latent variables (topics) estimated from word co-occurrence -- + Number of topics estimated is determined by the researcher (similar to exploratory factor analysis) -- * Each document is a mixture of topics + `\(\gamma\)` estimates provide probability a given topic is represented within a document * Each topic is a mixture of words + `\(\beta\)` estimates provide probability a given word is represented within a topic --- class: middle .major-emph-green[The fundamental idea] -- 1. Train a model on the content standards to estimate the latent topics represented therein. -- 2. Apply the model to the test items to estimate which topics the items represent (based on the text within the item). --- # Our application * Science NGSS Performance Expectations * Grade 8 statewide Alternate Assessment based on Alternate Achievement Standards (AA-AAS) + Designed for students with the .bolder[most] significant cognitive disabilities. + 1% reporting cap + Reduced in depth, breadth, and complexity --- # Analyses * Topics estimated using Latent Dirichlet Allocation + Common stop words removed ("and", "of", "the", etc.) + Webb's DOK verbs removed ("choose", "describe", "find") -- * Four methods evaluated to determine optimal number of topics (2-25 evaluated) + Arun et al. (2010): KL-Divergence + Cao et al. (2009): Cosine similarity + Deveaud et al. (2014): Jensen Shannon distance + Griffiths & Sayers (2004): harmonic mean of posterior log-likelihoods -- * Smaller range of topics evaluated by two science content experts for substantive meaning --- background-image:url(ncme19_files/figure-latex/n-topics-1.png) background-size: contain background-position: 90% 50% class: middle # Results: `\(n\)` topics * 3-6 topics evaluated for substantive meaning * 5-topic solution independently arrived upon + Distinct topics, little redundancy --- # Topics <br/> | Topic |Substantive Label | |:-----:|:---------------------------------------------------------------------| | 1 |Analyzing data and using evidence to understand organisms and systems | | 2 |Using scientific evidence to understand Earth systems | | 3 |Energy | | 4 |Genetic information | | 5 |Scientific solutions | --- background-image:url(ncme19_files/figure-latex/heatmap-1.png) background-size: contain background-position: 90% 50% class: middle # Mapping topics # to standards * Most standards represented by a single topic --- background-image:url(ncme19_files/figure-latex/word-freq-1.png) background-size: contain background-position: 90% 50% class: middle # Mapping words # to topics * Top 15 words displayed --- background-image:url(ncme19_files/figure-latex/unnamed-chunk-2-1.png) background-size: contain background-position: 90% 50% class: middle # Predicting # items # to topics * Nine random items displayed --- background-image:url(ncme19_files/figure-latex/unnamed-chunk-3-1.png) background-size: contain background-position: 90% 50% class: middle # Topic coverage --- # Discussion * Content validity is a critical component of "overall evaluative judgment" (Messick, 1995) of the validity of test-based inferences -- * Particularly important within standards-based educational systems -- * Text-modeling may serve as additional source of evidence (triangulation) -- * May be useful as a diagnostic tool --- # Limitations & Future Directions * Results depend upon chosen topic model - different models may lead to different inferences -- * Our model is preliminary, but publicly available. Consensus from field could help inform models that are useful and provide better validity evidence. -- * Our application was in Science with an AA-AAS + Generalizability to other content areas/tests is not known -- * What if an item has no text? + Text-modeling could perhaps be used to help "flag" items for further investigation + Alternative ML procedures (e.g., image recognition) may help --- # Conclusions * Text mining procedures may provide additional source of evidence + Perhaps supplementing formal alignment studies * Evidence could be used diagnostically * Topic modeling itself may be useful in understanding the topics represented in either the standards or a given test, indpendent of the linkage between the two. --- class: inverse center middle # Thanks [
repo](https://github.com/datalorax/text-analysis-content-validity) [
](mailto:daniela@uoregon.edu) [
@datalorax](https://twitter.com/datalorax_) <br/> Slides available at: http://www.datalorax.com/talks/ncme19/