Citations

 

Adnan, M. M. J., Hemmje, M. L., & Kaufmann, M. A. (2021). Social Media Mining to Study Social User Group by Visualizing Tweet Clusters using Word2Vec, PCA and K-Means. BIRDS+ WEPIR@ CHIIR, 40–51.

Amrouche, M. (2020, December 2). Short Text Topic Modeling. Medium. https://towardsdatascience.com/short-text-topic-modeling-70e50a57c883

Aggarwal, C. C., & Reddy, C. K. (2016).Data Clustering Algorithms and Applications. CRC Press.

Armstrong, G., Martino, C., Rahman, G., Gonzalez, A., Vázquez-Baeza, Y., Mishne, G., & Knight, R. (n.d.). Uniform Manifold Approximation and Projection (UMAP) Reveals Composite Patterns and Resolves Visualization Artifacts in Microbiome Data. MSystems, 6(5), e00691-21. https://doi.org/10.1128/mSystems.00691-21

Bhandari, R., & Schultz, K. (2018, January 26). Elizabeth Hawley, Who Chronicled Everest Treks, Dies at 94. The New York Times. https://www.nytimes.com/2018/01/26/obituaries/elizabeth-hawley-who-chronicled-everest-treks-dies-at-94.html

Borrelli, D. (2021, October 20). Clustering sentence embeddings to identify intents in short text.| Medium. https://towardsdatascience.com/clustering-sentence-embeddings-to-identify-intents-in-short-text-48d22d3bf02e

Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27. https://doi.org/10.1080/03610927408827101

Collins-Thompson, K. (n.d.). K-means clustering [Video]. https://www.coursera.org/learn/siads543/lecture/Q0tGE/k-means-clustering
**Please note, requires Coursera access or links returns 404 error**

Collins-Thompson, K. (n.d.). Evaluating and labeling clusters [Video]. https://www.coursera.org/learn/siads543/lecture/UV04f/evaluating-and-labeling-clusters
**Please note, requires Coursera access or links returns 404 error**

Graph Data Modeling Fundamentals in Neo4j Graph Academy. https://graphacademy.neo4j.com/courses/modeling-fundamentals/

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. (n.d.). Retrieved March 30, 2023, from https://learning.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

Heil, N. (2008). Dark summit: The extraordinary true story of Everest’s most controversial season. London: Virgin Books.

Hennig, C. (2015). What are the true clusters? (arXiv:1502.02555). arXiv. https://doi.org/10.48550/arXiv.1502.02555

Himalayas—Study and exploration | Britannica. (n.d.). Retrieved April 1, 2023, from https://www.britannica.com/place/Himalayas/Study-and-exploration

Himalayan Database. (n.d.). Retrieved February 24, 2023, from https://billibierling.com/himalayan-database/

David Sharp and Everest Controversy on Mountainzone.com. (n.d.). Retrieved April 4, 2023, from https://www.mountainzone.com/2006/david_sharp/

Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (n.d.). Mathematics for Machine Learning.

Grootendorst, M. (2020, October 6). Topic Modeling with BERT. Medium. https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

Grootendorst, M. P. (n.d.). bertopic: BERTopic performs topic Modeling with state-of-the-art transformer models. (0.14.1) [Python; MacOS, Microsoft:: Windows, POSIX, Unix]. Retrieved April 8, 2023, from https://github.com/MaartenGr/BERTopic

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211. https://doi.org/10.1007/s11042-018-6894-4

Kessler, J. S. (2023). Scattertext 0.1.18 [Python]. https://github.com/JasonKessler/scattertext (Original work published 2016)

Krippendorff, K. (2004). Content Analysis (Second Edition). SAGE Publications.

Krippendorff, K. (2011). “Computing Krippendorff’s alpha-reliability.” Philadelphia: Annenberg School for Communication Departmental Papers. Retrieved July 6, 2011 from: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1043&context=asc_papers

Kutuzov, A., & Kuzmenko, E. (2019). To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation. Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing, 22–28. https://aclanthology.org/W19-6203

Maaten, L. van der, & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605.

Okazaki, N., & Ananiadou, S. (2006, May). Clustering acronyms in biomedical text for disambiguation. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). LREC 2006, Genoa, Italy. http://www.lrec-conf.org/proceedings/lrec2006/pdf/351_pdf.pdf

Odumuyiwa, V. T., Umeanozie, A., Sennaike, O., Adekola, O., Sawyerr, B., & Fasina, E. (2022). Clustering Based Approach for Ground Truth Inference in Crowdsourced Data. FUOYE Journal of Engineering and Technology, 7(2), 141–147. https://doi.org/10.46792/fuoyejet.v7i2.800

Pelgrim, R. (2022, March 8). Short-Text Topic Modelling: LDA vs GSDMM. Medium. https://towardsdatascience.com/short-text-topic-modelling-lda-vs-gsdmm-20f1db742e14

Peterson, R. A. (2000). A Meta-Analysis of Variance Accounted for and Factor Loadings in Exploratory Factor Analysis. Marketing Letters, 11(3), 261–275. https://doi.org/10.1023/A:1008191211004

Raschka, S., & Mirjalili, V. (2017). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Packt Publishing Ltd.

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3980–3990. https://doi.org/10.18653/v1/D19-1410

Salisbury, R. (2022). The Himalayan Database, The Expedition Archives of Elizabeth Hawley, Program Guide for Windows Himal 2.5. https://www.himalayandatabase.com/downloads/Himalayan%20Database%20Guide.pdf

Salisbury, R., Hawley, E., & Bierling, B., (2021). The Himalaya by the Numbers, A Statistical Analysis of Mountaineering in the Nepal Himalaya, 1950-2019 https://www.himalayandatabase.com/hbn2019.html

Savage, D. A., & Torgler, B. (2013). The Times They are a Changin’: The Effect of Institutions on Behavior, Cooperation, Emotional Attachment and Sentiment at 26,000 Ft(SSRN Scholarly Paper No. 2274593). https://doi.org/10.2139/ssrn.2274593

Schwartz, A. S., & Hearst, M. A. (2002). A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXT. Biocomputing 2003, 451–462. https://doi.org/10.1142/9789812776303_0042

Sir George Everest | British geodesist | Britannica. (n.d.). Retrieved April 1, 2023, from https://www.britannica.com/biography/George-Everest

Speech and Language Processing. (n.d.). Retrieved April 10, 2023, from https://web.stanford.edu/~jurafsky/slp3/

Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 233–242. https://doi.org/10.1145/2623330.2623715

Zamanighomi, M., Lin, Z., Daley, T., Chen, X., Duren, Z., Schep, A., Greenleaf, W. J., & Wong, W. H. (2018). Unsupervised clustering and epigenetic classification of single cells. Nature Communications, 9(1), Article 1. https://doi.org/10.1038/s41467-018-04629-3

Zhao, K. (2019, December 10). Feature Extraction using Principal Component Analysis—A Simplified Visual Demo. Medium. https://towardsdatascience.com/feature-extraction-using-principal-component-analysis-a-simplified-visual-demo-e5592ced100a

Statement of Work (April 2023)

Simi Talkar

- Project scout and environment setup
- Exploratory Data Analysis
- Dash App (lead and creator)
- Docker container
- Scraping and API retrieval of social media data and analysis (Twitter/Reddit)
- Final write-up (lead)

Brian Seko
  • - Data cleaning and structure
  • - Route Memo Clustering
  • - Route Memo Topic Modeling
  • - Climbing Period Feature Analysis (not included here)
  • - Final Write-Up (lead)
Matthieu Lienart
  • - Scraping of additional Himalayan peak data
  • - Data cleaning and structure
  • - Neo4j Database
  • - Network Analysis
  • - Poster Creation (lead)
  • - Website (lead)
  • - Final write-up