Condon, D M and Revelle, W 2016 Selected ICAR Data from the SAPA-Project: Development and Initial Validation of a Public-Domain Measure. Journal of Open Psychology Data, 4: e1, DOI: http://dx.doi.org/10.5334/jopd.25 DATA PAPER Selected ICAR Data from the SAPA-Project: Development and Initial Validation of a Public-Domain Measure David M. Condon1 and William Revelle2 1 Department of Medical Social Sciences, Northwestern University, Chicago, Illinois, US david-condon@northwestern.edu 2 Department of Psychology, Northwestern University, Evanston, Illinois, US revelle@northwestern.edu These data were collected during the initial evaluation of the International Cognitive Ability Resource (ICAR) project. ICAR is an international collaborative effort to develop open-source public-domain tools for cognitive ability assessment, including tools that can be administered in non-proctored environments (e.g., online administration) and those which are based on automatic item generation algorithms. These data provide initial validation of the first four ICAR item types as reported in Condon & Revelle [1]. The 4 item types contain a total of 60 items: 9 Letter and Number Series items, 11 Matrix Reasoning items, 16 Verbal Reasoning items and 24 Three-dimensional Rotation items. Approximately 97,000 individuals were administered random subsets of these 60 items using the Synthetic Aperture Personality Assess- ment method between August 18, 2010 and May 20, 2013. The data are available in rdata and csv formats and are accompanied by documentation stored as a text file. Re-use potential includes a wide range of structural and item-level analyses. Keywords: Cognitive ability; intelligence; cognitive ability structure; International Cognitive Ability Resource; ICAR Funding statement: This research was supported in part by grant SMA-1419324 from the National Sci- ence Foundation to William Revelle. (1) Overview based on online administration relative to self-reported Collection Date(s) achievement test scores and university majors; see Condon & Data were collected between August 18, 2010 and May 20, Revelle [1]. The broader goal of these analyses was to dem- 2013. onstrate the utility and potential for public-domain cog- nitive ability measures that can be administered without Background proctoring via the internet. The SAPA Project is a collaborative online data collection It should be noted that additional ICAR item types have tool for assessing psychological constructs across multiple been developed since these data were collected and fur- domains of personality. These domains – temperament, ther development remains ongoing. More information cognitive abilities, and interests – have been chosen based about all of the ICAR measures, including the item con- on historical and current prominence in the field of indi- tent, can be found at the ICAR website (icar-project.com). vidual differences research. The primary goal of the SAPA Project is to determine the combined and independent (2) Methods structures of each of these domains based on the collec- Sample tion of large, cross-sectional, online samples. Secondary Participants (N = 96,958) completed the online survey in goals include [1] the identification of additional domains exchange for feedback about various aspects of their person- (e.g., motivation, character) which may also provide insight ality and cognitive abilities. No active advertisements or mar- into the ways that individuals differ; and [2] an improved keting efforts were used to attract participants for this data understanding of the demographic and psychographic cor- collection; web traffic statistics (collected through Google relates of individual differences in personality. Analytics) suggest that participants who did not come to the The data described here were collected to evaluate the website directly were directed to it through links from vari- item characteristics, reliability and structural properties of ous other websites about personality, personality research, ICAR measures assessing 4 distinct components of cognitive general psychology topics, and psychometrics. Many of ability. The data were also used to validate the ICAR items these websites were academic/educational in nature. Art. e1, p.  2 of 6 Condon and Revelle: Selected ICAR Data from the SAPA-Project Many demographic and psychographic variables are 16 Verbal Reasoning items and 24 Three-dimensional included in the data. These include: gender (66.2% of the Rotation items. The Letter and Number Series (“LN”) items participants were female); age (see Figure 1); marital sta- prompt participants with short digit or letter sequences tus (see Table 1); body mass index (see Figure 2); country and ask them to identify the next position in the sequence (198 countries were represented in total; 78.1% of partici- from among six choices. Matrix Reasoning (“MR”) items pants were from the United States, and 34 countries had contain stimuli that are similar to those used in Raven’s more than 100 participants); state/region (for 32 coun- Progressive Matrices. The stimuli are 3x3 arrays of geomet- tries); educational attainment level (see Table 2); employ- ric shapes with one of the nine shapes missing. Participants ment status (see Table 3); self-reported achievement test are instructed to identify which of the six geometric shapes scores (SAT – Critical Reading, SAT – Mathematics, and presented as response choices will best complete the stimuli. ACT Composite); parental educational attainment level The Verbal Reasoning (“VR”) items include a variety of logic, (for 1 or 2 parents); and parental field of employment (for vocabulary and general knowledge questions. The Three- 1 or 2 parents). Participants were not required to provide dimensional Rotation (“R3D”) items present participants any of these data except age and gender. with cube renderings and ask participants to identify which of the response choices is a possible rotation of the target Materials stimuli. None of the items were timed in these administra- Items from 4 cognitive ability scales were administered. tions as untimed administration was expected to provide These scales were all developed in the lab of the second more stringent and conservative evaluation of the items’ author for the purposes of online assessment of cognitive utility when given online (there are no specific reasons pre- ability. The 4 scales contained a total of 60 items: 9 Letter cluding timed administrations of the ICAR items, whether and Number Series items, 11 Matrix Reasoning items, online or offline). 6000 4000 2000 0 2000 4000 6000 count Figure 1: Participants by age and gender (males in blue, females in red). AAggee 0 20 40 60 80 Condon and Revelle: Selected ICAR Data from the SAPA-Project Art. e1, p.  3 of 6 Marital Status Participants be obtained by contacting the first author. It should be noted that many useful analyses might be conducted Never Married 71,227 on the unscored data including evaluation of ‘distractor’ Married 16,597 response choices and their relationship with both other Domestic Partnership 6,070 items and demographic variables. Divorced & Single 2,470 Procedures Divorced & Remarried 497 The items were administered using the Synthetic Aperture Personality Assessment (“SAPA”) technique [5], a variant of Widowed & Single 97 matrix sampling procedures discussed by Lord [3]. This Table 1: Marital Status. method produces data which contain “massive missing- ness” by design [4]. This missingness qualifies for classifi- For each of the 60 items, participants were instructed to cation as missing completely at random (“MCAR”, 2) and choose the best answer from 8 response choices, including it is further described as massively missing because the “None of these” and “I don’t know”. In order to maintain mean level of missingness by participant was approxi- the integrity of the measures, the data provided here mately 68%. The number of administrations for each have already been scored. The raw unscored data may item varied considerably (median = 21,764; m = 19,998; 8000 6000 4000 2000 0 20 40 60 Body Mass Index Figure 2: Participants by Body Mass Index. count Art. e1, p.  4 of 6 Condon and Revelle: Selected ICAR Data from the SAPA-Project Educational Level Participants 1. Partial removal of data collected from participants who completed the survey more than once in a single Less than 12 years 14,034 browser session. This was done by assigning partici- High school graduate 5,995 pants a random user ID that was persistant as long Currently in college/university 49,810 as their current browser session remained active. In those cases where more than 1 response set was Some college/university, 4,868 entered in a single browser session, only the first but did not graduate response set was kept. College/university degree 11,382 2. Removal of participants with self-reported ages Currently in graduate 4,225 younger than 14 and older than 90. The survey is or professional school not intended for participants younger than 14. Self- Graduate or professional 6,644 reported ages over 90 were removed on the grounds school degree that they were deemed to be unlikely. Table 2: Educational Attainment Level. Ethical issues No personally identifying information were collected from participants in these data. Employment Status Participants Currently a student 48,716 (3) Dataset description Object name Employed 37,619 ‘sapaICARData18aug2010thru20may2013.rdata’ Not employed, seeking work 3,506 The data file is named to indicate the data collec- Not employed 3,504 tion method (SAPA), the source of the items (ICAR), and the time period over which the data were collected Homemaker 1,686 (18aug2010 through 20may2013). The file can be found Retired 817 at: http://dx.doi.org/10.7910/DVN/AD9RVY. Table 3: Employment Status. The rdata file includes three objects. The most per- tinent of these is the raw data object (‘sapaICAR-Data- 18aug2010thru20may2013’). The remaining two objects sd = 10,958) as did the number of pairwise administra- are helper files for data analysis: ‘ItemLists’ is a list object tions between any two items in the set (median = 2,610; that provides an index of the ICAR items associated with m = 4,240; sd = 4,110). The items were presented to par- each item type, and ‘superKey60’ is a scoring matrix for ticipants in random order as part of a broader personal- the 4 ICAR scales (though the items have previously been ity survey, and participants responded to as many items ‘scored’ as correct or incorrect - using 1s and 0s, respec- as they wished. The broader survey included items relat- tively – the scoring key remains useful for scoring the ing to a range of topics such as Big Five personality traits, scales). There is also a text file (‘demographic codes.txt’) vocational interests, creativity and more. The number of which describes the coding for all of the other variables items administered to each participant was procedurally in the data set. independent of participant response characteristics; par- The variable names within the data file have been coded ticipants were encouraged to complete 16 items. On aver- with the acronyms “LN” for Letter and Number Series, age, participants responded to 12.4 (sd = 3.7; median = 12) “MR” for Matrix Reasoning, “R3D” for Three-Dimensional of the ICAR items; it is not clear why some participants Rotation, and “VR” for Verbal Reasoning. elected to complete fewer than 16 items though it seems likely that participants skipped the items that were par- Data type ticularly challenging (it would be possible to explore this Self-report, cross-sectional survey data from 96,958 topic further using the data described here). The feedback participants. provided to participants on these cognitive ability items was informal and basic. Participants were told how many Format names and versions of the cognitive ability items were answered correctly The data are stored as a single rdata file (approximately out of those for which a response was given (e.g., “you 2.7 MB) and three separate csv files (approximately 32 MB answered 12 items correctly out of 14”). Participants were in total). The rdata file includes the three objects described also informed about the average number of items answered above: ‘ItemLists’, ‘superKey60’ and the main data object correctly by previous participants of their age and gender, ‘sapaICARData18aug2010thru20may2013’. Each of these though no specific interpretative guidance was given about three objects are separated into individual csv files. There their score. For more information about the development is also an associated text file that provides full information and use of these measures, see Condon & Revelle [1]. on the demographic codes (‘demographic codes.txt’). Quality Control Data Collectors The available data are presented largely as they were The first and second author were responsible for collect- c ollected with only two exceptions: ing all the data described in this dataset. Condon and Revelle: Selected ICAR Data from the SAPA-Project Art. e1, p.  5 of 6 Language functionality and aesthetic design of the website and for All aspects of the survey and website were written in improving the data storage and data exportation meth- English. Data collected about the website through Google ods). The first author also took the lead role in cleaning Analytics suggests that some participants used browser- the data to prepare it for sharing, making the data avail- based translation software, but no specifics are available able in the Dataverse, and preparing and submitting this about the extent and effect of these translations. manuscript. The second author, William Revelle, PhD, is cred- License ited with first implementing the survey sampling tech- The data have been deposited under the open license CC0 niques that made this data collection possible (Synthetic (Public Domain Dedication). Aperture Personality Assessment, “SAPA”) and for develop- ing earlier incarnations of the survey described on his web- Embargo site (personality-project.org). He also owns the url for the The data are freely available for use with appropriate website where these data were collected (sapa-project.org). citation. The second author played a secondary role in the devel- opment and maintenance of the website used to collect Repository location these data. The data were published on Dataverse and are located at The first and second author both contributed substan- http://dx.doi.org/10.7910/DVN/AD9RVY. tially to the design, development and psychometric valida- tion of the ICAR item types included here. Melissa Mitchell Publication date (not listed as an author) also contributed to the original The dataset was published on September 23, 2015. design of several items. (4) Reuse potential Acknowledgements The data are well-suited for many types of structural and The authors would like to acknowledge Melissa Mitchell correlational analyses of cognitive abilities, including for her contribution to the second author in developing those aimed at reproducing or extending the analysis several of the ICAR items. described by Condon & Revelle [1]. These might include evaluation of the ways in which the 4 item types relate to References one another, evaluations of their shared structure, evalu- 1. Condon, D M and Revelle, W 2014 The Interna- ations of structural relationships across constructs in vari- tional Cognitive Ability Resource: Development and ous groups of participants, evaluations of differential item initial validation of a public-domain measure. Intel- functioning, meta-analyses, or the development of new ligence, 43: 52–64. DOI: http://dx.doi.org/10.1016/ IRT-based adaptive measures. It should be noted that the j.intell.2014.01.004 feasibility of some of these analyses may be affected by 2. Graham, J W 2009 Missing Data Analysis: Making It the substantial missingness in the data. Additional, non- Work in the Real World. Annual Review of P sychology, overlapping data sets from the SAPA Project are also avail- 60(1): 549–576. DOI: http://dx.doi.org/10.1146/ able for use, including those which contain measures of annurev.psych.58.110405.085530 personality and other constructs; contact the authors for 3. Lord, F M 1955 Sampling fluctuations resulting from more information. the sampling of test items. Psychometrika, 20(1), 1–22, DOI: http://dx.doi.org/10.1007/bf02288956 Competing Interests 4. Brown, A D 2015 (July) Standard Errors of SAPA The authors declare that they have no competing interests. Correlations: A Monte Carlo Analysis. In W. Revelle (Chair), Studying Individual Differences Using the Web: A Report Authors Information from the SAPA Project. Symposium conducted at the The first author is David M. Condon, PhD, an Assistant biennial meeting of the International Society for the Professor at Northwestern University’s Feinberg School of Study of Individual Differences, London, Ontario, CA. Medicine in the Department of Medical Social Sciences. 5. Revelle, W, Wilt, J and Rosenthal, A 2010 Individual His contribution included a primary role in the technical Differences in Cognition: New Methods for Examin- development of the website through which the validation ing the Personality-Cognition Link. In Gruszka, A, data were collected (sapa-project.org). This role involved Matthews, G and Szymura, B (Eds.) Handbook of Indi- the adaptation of existing code (primarily generated vidual Differences in Cognition. New York, NY: Springer by the second author for previous data collection pro- New York, pp. 27–49. DOI: http://dx.doi.org/10.1007/ jects) and the authorship of new code (for extending the 978-1-4419-1210-7_2 Peer review comments: http://dx.doi.org/10.5334/jopd.25.pr Art. e1, p.  6 of 6 Condon and Revelle: Selected ICAR Data from the SAPA-Project How to cite this article: Condon, D M and Revelle, W 2016 Selected ICAR Data from the SAPA-Project: Development and Initial Validation of a Public-Domain Measure. Journal of Open Psychology Data, 4: e1, DOI: http://dx.doi.org/10.5334/jopd.25 Published: 26 January 2016 Copyright: © 2016 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. Journal of Open Psychology Data is a peer-reviewed open access journal published by Ubiquity Press OPEN ACCESS